[jira] [Updated] (PARQUET-1172) Question on pig loader read parquet file

abel_ke (JIRA) Wed, 06 Dec 2017 19:48:14 -0800

     [ 
https://issues.apache.org/jira/browse/PARQUET-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


abel_ke updated PARQUET-1172:
-----------------------------
    Description: 
When I use spark save parquet file, schema like this

 
{code:java}
// Some comments here
public String getFoo()
{
    return foo;
}
{code}
@
{noformat}
optional group attref (LIST) {
       repeated group list {
         optional group element {
           optional binary nid (UTF8);
           optional binary nss (UTF8);
         }
       }
     }
{noformat}

And then use parquet-pig-bundle to read this file, the read function can work, 
but when i need to access "nid" it have some problem

If I read other file save by pig-storer, and need nid list, pig command is:
 B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , 
value.attref.nid;

but read spark save version I need use this:
 B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
 C = foreach B generate clientIp, guid, attref::element.nid; 
and this command will flatten column 

My question is pig loader have some problem when loading parquet file(save by 
spark)

  was:
When I use spark save parquet file, schema like this

 optional group attref (LIST) {
       repeated group list {
         optional group element {
           optional binary nid (UTF8);
           optional binary nss (UTF8);
         }
       }
     }

And then use parquet-pig-bundle to read this file, the read function can work, 
but when i need to access "nid" it have some problem

If I read other file save by pig-storer, and need nid list, pig command is:
 B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , 
value.attref.nid;

but read spark save version I need use this:
 B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
 C = foreach B generate clientIp, guid, attref::element.nid; 
and this command will flatten column 

My question is pig loader have some problem when loading parquet file(save by 
spark)


> Question on pig loader read parquet file 
> -----------------------------------------
>
>                 Key: PARQUET-1172
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1172
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr, parquet-pig
>    Affects Versions: 1.9.0, 1.9.1
>            Reporter: abel_ke
>
> When I use spark save parquet file, schema like this
>  
> {code:java}
> // Some comments here
> public String getFoo()
> {
>     return foo;
> }
> {code}
> @
> {noformat}
> optional group attref (LIST) {
>        repeated group list {
>          optional group element {
>            optional binary nid (UTF8);
>            optional binary nss (UTF8);
>          }
>        }
>      }
> {noformat}
> And then use parquet-pig-bundle to read this file, the read function can 
> work, but when i need to access "nid" it have some problem
> If I read other file save by pig-storer, and need nid list, pig command is:
>  B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , 
> value.attref.nid;
> but read spark save version I need use this:
>  B = foreach M generate value.addr.clientIp, value.guid , 
> flatten(value.attref);
>  C = foreach B generate clientIp, guid, attref::element.nid; 
> and this command will flatten column 
> My question is pig loader have some problem when loading parquet file(save by 
> spark)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PARQUET-1172) Question on pig loader read parquet file

Reply via email to