[ 
https://issues.apache.org/jira/browse/PIG-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keren Ouaknine updated PIG-4073:
--------------------------------

    Description: 
I am executing a parsing query (see page_views_convert_asterix.pig attached + 
pigmix.jar)
The query outputs a different result whether it:
a) executed on a single file
b) executed on a folder of two files

The two outputs were too large to attach but you can see the erroneous results 
already from their first lines:
Expected results - running on one file:
================================
{"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" : 
38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info": 
{"f":"EHm[ER\\QXr","g":"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQYvoeBe","i":"keJOUp_"
 } , "page_links": {{  { 
"f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":"FgjjZaEwL^@","j":"ODtQaLv`GSv","k":"HlWLRRucI","h":"BF]y_rVh_Ea","i":"\\r]ODckjoUL"
 } , { 
"f":"VZe`Xe[G","g":"JcxGPR`[`","d":"Qeq\\HgN_gaJk","e":"\\DbThsT`Gar","b":"_OSs`txLnnp[","c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C^@"
 } , { 
"f":"ov^eekm","g":"CSy]jpA_","d":"iCNeW`ylQw","e":"ciQG`uoC\\kn","b":"QfAFspC\\Ian","c":"eYlFKjtLws","a":"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\V^@","v":"DsFlPJ`Jv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
 } , { 
"f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVVAI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"cm\\x_Kkym","t":"aWRD`Nm^@","s":"PoPoZWwBvM","r":"ttqAgoDKAR","q":"slcrtcLC","p":"B\\a_TCnAk"
 } , { 
"f":"ESBcvWM","g":"bheg_j^qeeb","d":"UsE_^aslG","e":"LGEuvVYUAa","b":"BleiHUjdwE","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","t":"qxDWyEit^@","s":"frF_`kfSRQBM","r":"jowmZjF]D]]","q":"`to`RKYbQNn","p":"lYcSAlb"
 }  }}  } 
{"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 
1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : 
-4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...

Non expected results - as folder was containing two files:
===================================
As you can see: the id=0 record seems to have stopped in the middle of the 
parsing 
{"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" : 
38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info": 
{"f":"EHm[ER\\QXr","g"
:"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQY
voeBe","i":"keJOUp_" } , "page_links": {{  { 
"f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":
"FgjjZaEwL"c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\VJv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
 } , { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVV
AI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"
cm\\x_Kkym","t":"aWRD`Nm","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","
t":"qxDWyEit
{"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 
1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : 
-4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...


> Pig query (parsing) only works on one file rather than on a folder or a list 
> of files 
> --------------------------------------------------------------------------------------
>
>                 Key: PIG-4073
>                 URL: https://issues.apache.org/jira/browse/PIG-4073
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.13.0
>            Reporter: Keren Ouaknine
>            Priority: Critical
>         Attachments: page_views_convert_asterix.pig, pigmix.jar
>
>
> I am executing a parsing query (see page_views_convert_asterix.pig attached + 
> pigmix.jar)
> The query outputs a different result whether it:
> a) executed on a single file
> b) executed on a folder of two files
> The two outputs were too large to attach but you can see the erroneous 
> results already from their first lines:
> Expected results - running on one file:
> ================================
> {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" 
> : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info": 
> {"f":"EHm[ER\\QXr","g":"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQYvoeBe","i":"keJOUp_"
>  } , "page_links": {{  { 
> "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":"FgjjZaEwL^@","j":"ODtQaLv`GSv","k":"HlWLRRucI","h":"BF]y_rVh_Ea","i":"\\r]ODckjoUL"
>  } , { 
> "f":"VZe`Xe[G","g":"JcxGPR`[`","d":"Qeq\\HgN_gaJk","e":"\\DbThsT`Gar","b":"_OSs`txLnnp[","c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C^@"
>  } , { 
> "f":"ov^eekm","g":"CSy]jpA_","d":"iCNeW`ylQw","e":"ciQG`uoC\\kn","b":"QfAFspC\\Ian","c":"eYlFKjtLws","a":"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\V^@","v":"DsFlPJ`Jv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
>  } , { 
> "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVVAI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"cm\\x_Kkym","t":"aWRD`Nm^@","s":"PoPoZWwBvM","r":"ttqAgoDKAR","q":"slcrtcLC","p":"B\\a_TCnAk"
>  } , { 
> "f":"ESBcvWM","g":"bheg_j^qeeb","d":"UsE_^aslG","e":"LGEuvVYUAa","b":"BleiHUjdwE","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","t":"qxDWyEit^@","s":"frF_`kfSRQBM","r":"jowmZjF]D]]","q":"`to`RKYbQNn","p":"lYcSAlb"
>  }  }}  } 
> {"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 
> 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : 
> -4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...
> Non expected results - as folder was containing two files:
> ===================================
> As you can see: the id=0 record seems to have stopped in the middle of the 
> parsing 
> {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" 
> : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info": 
> {"f":"EHm[ER\\QXr","g"
> :"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQY
> voeBe","i":"keJOUp_" } , "page_links": {{  { 
> "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":
> "FgjjZaEwL"c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\VJv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
>  } , { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVV
> AI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"
> cm\\x_Kkym","t":"aWRD`Nm","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","
> t":"qxDWyEit
> {"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 
> 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : 
> -4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to