[
https://issues.apache.org/jira/browse/PARQUET-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
abel_ke updated PARQUET-1172:
-----------------------------
Description:
When I use spark save parquet file, schema like this
{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}
@
{noformat}
optional group attref (LIST) {
repeated group list {
optional group element {
optional binary nid (UTF8);
optional binary nss (UTF8);
}
}
}
{noformat}
And then use parquet-pig-bundle to read this file, the read function can work,
but when i need to access "nid" it have some problem
If I read other file save by pig-storer, and need nid list, pig command is:
B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid ,
value.attref.nid;
but read spark save version I need use this:
B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
C = foreach B generate clientIp, guid, attref::element.nid;
and this command will flatten column
My question is pig loader have some problem when loading parquet file(save by
spark)
was:
When I use spark save parquet file, schema like this
optional group attref (LIST) {
repeated group list {
optional group element {
optional binary nid (UTF8);
optional binary nss (UTF8);
}
}
}
And then use parquet-pig-bundle to read this file, the read function can work,
but when i need to access "nid" it have some problem
If I read other file save by pig-storer, and need nid list, pig command is:
B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid ,
value.attref.nid;
but read spark save version I need use this:
B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
C = foreach B generate clientIp, guid, attref::element.nid;
and this command will flatten column
My question is pig loader have some problem when loading parquet file(save by
spark)
> Question on pig loader read parquet file
> -----------------------------------------
>
> Key: PARQUET-1172
> URL: https://issues.apache.org/jira/browse/PARQUET-1172
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr, parquet-pig
> Affects Versions: 1.9.0, 1.9.1
> Reporter: abel_ke
>
> When I use spark save parquet file, schema like this
>
> {code:java}
> // Some comments here
> public String getFoo()
> {
> return foo;
> }
> {code}
> @
> {noformat}
> optional group attref (LIST) {
> repeated group list {
> optional group element {
> optional binary nid (UTF8);
> optional binary nss (UTF8);
> }
> }
> }
> {noformat}
> And then use parquet-pig-bundle to read this file, the read function can
> work, but when i need to access "nid" it have some problem
> If I read other file save by pig-storer, and need nid list, pig command is:
> B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid ,
> value.attref.nid;
> but read spark save version I need use this:
> B = foreach M generate value.addr.clientIp, value.guid ,
> flatten(value.attref);
> C = foreach B generate clientIp, guid, attref::element.nid;
> and this command will flatten column
> My question is pig loader have some problem when loading parquet file(save by
> spark)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)