Parquet error while saving in HDFS

2017-07-24 Thread unk1102
:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-error-while-saving-in-HDFS-tp28998.html Sent from the Apache Spark User

Re: parquet error

2015-09-18 Thread Chengi Liu
Hi, I did some digging.. I believe the error is caused by jets3t jar. Essentially these lines locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 'java/net/URI', 'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop/fs/s3/S3Credentials',

Re: parquet error

2015-09-18 Thread Cheng Lian
Not sure what's happening here, but I guess it's probably a dependency version issue. Could you please give vanilla Apache Spark a try to see whether its a CDH specific issue or not? Cheng On 9/17/15 11:44 PM, Chengi Liu wrote: Hi, I did some digging.. I believe the error is caused by

parquet error

2015-09-16 Thread Chengi Liu
Hi, I have a spark cluster setup and I am trying to write the data to s3 but in parquet format. Here is what I am doing df = sqlContext.load('test', 'com.databricks.spark.avro') df.saveAsParquetFile("s3n://test") But I get some nasty error: Py4JJavaError: An error occurred while calling

Re: Parquet error reading data that contains array of structs

2015-04-29 Thread Cheng Lian
Thanks for the detailed information! Now I can confirm that this is a backwards-compatibility issue. The data written by parquet 1.6rc7 follows the standard LIST structure. However, Spark SQL still uses old parquet-avro style two-level structures, which causes the problem. Cheng On 4/27/15

Re: Parquet error reading data that contains array of structs

2015-04-27 Thread Jianshi Huang
FYI, Parquet schema output: message pig_schema { optional binary cust_id (UTF8); optional int32 part_num; optional group ip_list (LIST) { repeated group ip_t { optional binary ip (UTF8); } } optional group vid_list (LIST) { repeated group vid_t { optional binary

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Jianshi Huang
Hi Huai, I'm using Spark 1.3.1. You're right. The dataset is not generated by Spark. It's generated by Pig using Parquet 1.6.0rc7 jars. Let me see if I can send a testing dataset to you... Jianshi On Sat, Apr 25, 2015 at 2:22 AM, Yin Huai yh...@databricks.com wrote: oh, I missed that. It

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Cheng Lian
Had an offline discussion with Jianshi, the dataset was generated by Pig. Jianshi - Could you please attach the output of parquet-schema path-to-parquet-file? I guess this is a Parquet format backwards-compatibility issue. Parquet hadn't standardized representation of LIST and MAP until

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Cheng Lian
Had an offline discussion with Jianshi, the dataset was generated by Pig. Jianshi - Could you please attach the output of parquet-schema path-to-parquet-file? I guess this is a Parquet format backwards-compatibility issue. Parquet hadn't standardized representation of LIST and MAP until

Parquet error reading data that contains array of structs

2015-04-24 Thread Jianshi Huang
Hi, My data looks like this: +---++--+ | col_name | data_type | comment | +---++--+ | cust_id | string | | | part_num | int|

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Ted Yu
Yin: Fix Version of SPARK-4520 is not set. I assume it was fixed in 1.3.0 Cheers Fix Version On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote: The exception looks like the one mentioned in https://issues.apache.org/jira/browse/SPARK-4520. What is the version of Spark?

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Yin Huai
oh, I missed that. It is fixed in 1.3.0. Also, Jianshi, the dataset was not generated by Spark SQL, right? On Fri, Apr 24, 2015 at 11:09 AM, Ted Yu yuzhih...@gmail.com wrote: Yin: Fix Version of SPARK-4520 is not set. I assume it was fixed in 1.3.0 Cheers Fix Version On Fri, Apr 24,

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Yin Huai
The exception looks like the one mentioned in https://issues.apache.org/jira/browse/SPARK-4520. What is the version of Spark? On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, My data looks like this: +---++--+ |