[
https://issues.apache.org/jira/browse/PARQUET-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052677#comment-15052677
]
Jerry Ylilammi commented on PARQUET-402:
----------------------------------------
The following code doesn't work, I think it should because the schema is
defined. I'm not sure if the problem is in Avro->Pig or Pig->Parquet, somewhere
the schema gets losts. By adding foreach/generate step where I only redefined
the map as map[chararray] it works.
{code}data = LOAD '...' USING org.apache.pig.piggybank.storage.avro.AvroStorage(
'ignore_bad_files',
'schema', '
{"namespace": "...",
"type": "record",
"name": "...",
"fields": [
...
{"name": "headers", "type": ["null", {"type": "map", "values":
"string"}]},
...
]
}');
STORE data INTO '...' USING ParquetStorer();{code}
All parquet-pig-bundles after version 1.6.0 (under twitter) use some different
package naming scheme and you get class not found for ParquetStorer.
> Apache Pig cannot store Map data type into Parquet format
> ---------------------------------------------------------
>
> Key: PARQUET-402
> URL: https://issues.apache.org/jira/browse/PARQUET-402
> Project: Parquet
> Issue Type: Bug
> Components: parquet-pig
> Affects Versions: 1.6.0, 1.8.1
> Reporter: Jerry Ylilammi
>
> Trying to store simple map with two entries gives me following exception:
> {code}table_with_map_data: {my_map: map[]}
> 2015-12-10 11:58:54,478 [main] INFO
> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2015-12-10 11:58:54,498 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2999: Unexpected internal error. Invalid map Schema, schema should contain
> exactly one field: my_map: map{code}
> For example taking any input and doing this gives me the exception:
> {code}table_with_map_data = FOREACH random_data GENERATE TOMAP('123',
> 'hello', '456', 'world') as (my_map);
> DESCRIBE table_with_map_data;
> STORE table_with_map_data INTO '...' USING ParquetStorer();{code}
> I'm using latest version of Pig: Apache Pig version 0.15.0 (r1682971)
> compiled Jun 01 2015, 11:44:35
> and Parquet: parquet-pig-bundle-1.6.0.jar
> EDIT: I noticed Parquet 1.8.1 is out. I switched to it and were forced to
> update the pig script to use full path with ParquetStorer. However this gives
> me same error as 1.6.0.
> {code}STORE table_with_map_data INTO
> '/Users/jerry/tmp/parquet/output/parquet' USING
> org.apache.parquet.pig.ParquetStorer();{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)