[
https://issues.apache.org/jira/browse/PIG-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keren Ouaknine updated PIG-4073:
--------------------------------
Description:
I am executing a parsing query (see page_views_convert_asterix.pig attached +
pigmix.jar)
The query outputs a different result whether it:
a) executed on a single file
b) executed on a folder of two files
The two outputs were too large to attach but you can see the erroneous results
already from their first lines:
Expected results - running on one file:
================================
{"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" :
38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":
{"f":"EHm[ER\\QXr","g":"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQYvoeBe","i":"keJOUp_"
} , "page_links": {{ {
"f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":"FgjjZaEwL^@","j":"ODtQaLv`GSv","k":"HlWLRRucI","h":"BF]y_rVh_Ea","i":"\\r]ODckjoUL"
} , {
"f":"VZe`Xe[G","g":"JcxGPR`[`","d":"Qeq\\HgN_gaJk","e":"\\DbThsT`Gar","b":"_OSs`txLnnp[","c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C^@"
} , {
"f":"ov^eekm","g":"CSy]jpA_","d":"iCNeW`ylQw","e":"ciQG`uoC\\kn","b":"QfAFspC\\Ian","c":"eYlFKjtLws","a":"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\V^@","v":"DsFlPJ`Jv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
} , {
"f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVVAI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"cm\\x_Kkym","t":"aWRD`Nm^@","s":"PoPoZWwBvM","r":"ttqAgoDKAR","q":"slcrtcLC","p":"B\\a_TCnAk"
} , {
"f":"ESBcvWM","g":"bheg_j^qeeb","d":"UsE_^aslG","e":"LGEuvVYUAa","b":"BleiHUjdwE","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","t":"qxDWyEit^@","s":"frF_`kfSRQBM","r":"jowmZjF]D]]","q":"`to`RKYbQNn","p":"lYcSAlb"
} }} }
{"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" :
1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" :
-4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...
Non expected results - as folder was containing two files:
===================================
As you can see: the id=0 record seems to have stopped in the middle of the
parsing
{"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" :
38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":
{"f":"EHm[ER\\QXr","g"
:"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQY
voeBe","i":"keJOUp_" } , "page_links": {{ {
"f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":
"FgjjZaEwL"c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\VJv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
} , { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVV
AI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"
cm\\x_Kkym","t":"aWRD`Nm","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","
t":"qxDWyEit
{"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" :
1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" :
-4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...
> Pig query (parsing) only works on one file rather than on a folder or a list
> of files
> --------------------------------------------------------------------------------------
>
> Key: PIG-4073
> URL: https://issues.apache.org/jira/browse/PIG-4073
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.13.0
> Reporter: Keren Ouaknine
> Priority: Critical
> Attachments: page_views_convert_asterix.pig, pigmix.jar
>
>
> I am executing a parsing query (see page_views_convert_asterix.pig attached +
> pigmix.jar)
> The query outputs a different result whether it:
> a) executed on a single file
> b) executed on a folder of two files
> The two outputs were too large to attach but you can see the erroneous
> results already from their first lines:
> Expected results - running on one file:
> ================================
> {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr"
> : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":
> {"f":"EHm[ER\\QXr","g":"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQYvoeBe","i":"keJOUp_"
> } , "page_links": {{ {
> "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":"FgjjZaEwL^@","j":"ODtQaLv`GSv","k":"HlWLRRucI","h":"BF]y_rVh_Ea","i":"\\r]ODckjoUL"
> } , {
> "f":"VZe`Xe[G","g":"JcxGPR`[`","d":"Qeq\\HgN_gaJk","e":"\\DbThsT`Gar","b":"_OSs`txLnnp[","c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C^@"
> } , {
> "f":"ov^eekm","g":"CSy]jpA_","d":"iCNeW`ylQw","e":"ciQG`uoC\\kn","b":"QfAFspC\\Ian","c":"eYlFKjtLws","a":"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\V^@","v":"DsFlPJ`Jv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
> } , {
> "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVVAI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"cm\\x_Kkym","t":"aWRD`Nm^@","s":"PoPoZWwBvM","r":"ttqAgoDKAR","q":"slcrtcLC","p":"B\\a_TCnAk"
> } , {
> "f":"ESBcvWM","g":"bheg_j^qeeb","d":"UsE_^aslG","e":"LGEuvVYUAa","b":"BleiHUjdwE","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","t":"qxDWyEit^@","s":"frF_`kfSRQBM","r":"jowmZjF]D]]","q":"`to`RKYbQNn","p":"lYcSAlb"
> } }} }
> {"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" :
> 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" :
> -4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...
> Non expected results - as folder was containing two files:
> ===================================
> As you can see: the id=0 record seems to have stopped in the middle of the
> parsing
> {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr"
> : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":
> {"f":"EHm[ER\\QXr","g"
> :"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQY
> voeBe","i":"keJOUp_" } , "page_links": {{ {
> "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":
> "FgjjZaEwL"c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\VJv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_"
> } , { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVV
> AI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"
> cm\\x_Kkym","t":"aWRD`Nm","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","
> t":"qxDWyEit
> {"id":1,"user" : "kiP\\R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" :
> 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" :
> -4.359457905991358E8,"page_info": {"f":"D\\[RkWZe...
--
This message was sent by Atlassian JIRA
(v6.2#6252)