[ 
https://issues.apache.org/jira/browse/PARQUET-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052677#comment-15052677
 ] 

Jerry Ylilammi commented on PARQUET-402:
----------------------------------------


The following code doesn't work, I think it should because the schema is 
defined. I'm not sure if the problem is in Avro->Pig or Pig->Parquet, somewhere 
the schema gets losts. By adding foreach/generate step where I only redefined 
the map as map[chararray] it works.
{code}data = LOAD '...' USING org.apache.pig.piggybank.storage.avro.AvroStorage(
    'ignore_bad_files',
    'schema', '
    {"namespace": "...",
     "type": "record",
     "name": "...",
     "fields": [
        ...
        {"name": "headers", "type": ["null", {"type": "map", "values": 
"string"}]},
        ...
     ]
    }');

STORE data INTO '...' USING ParquetStorer();{code}

All parquet-pig-bundles after version 1.6.0 (under twitter) use some different 
package naming scheme and you get class not found for ParquetStorer.

> Apache Pig cannot store Map data type into Parquet format
> ---------------------------------------------------------
>
>                 Key: PARQUET-402
>                 URL: https://issues.apache.org/jira/browse/PARQUET-402
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-pig
>    Affects Versions: 1.6.0, 1.8.1
>            Reporter: Jerry Ylilammi
>
> Trying to store simple map with two entries gives me following exception:
> {code}table_with_map_data: {my_map: map[]}
> 2015-12-10 11:58:54,478 [main] INFO  
> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
> deprecated. Instead, use fs.defaultFS
> 2015-12-10 11:58:54,498 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. Invalid map Schema, schema should contain 
> exactly one field: my_map: map{code}
> For example taking any input and doing this gives me the exception:
> {code}table_with_map_data = FOREACH random_data GENERATE TOMAP('123', 
> 'hello', '456', 'world') as (my_map);
> DESCRIBE table_with_map_data;
> STORE table_with_map_data INTO '...' USING ParquetStorer();{code}
> I'm using latest version of Pig: Apache Pig version 0.15.0 (r1682971) 
> compiled Jun 01 2015, 11:44:35
> and Parquet: parquet-pig-bundle-1.6.0.jar
> EDIT: I noticed Parquet 1.8.1 is out. I switched to it and were forced to 
> update the pig script to use full path with ParquetStorer. However this gives 
> me same error as 1.6.0.
> {code}STORE table_with_map_data INTO 
> '/Users/jerry/tmp/parquet/output/parquet' USING 
> org.apache.parquet.pig.ParquetStorer();{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to