On 12/04/2014 11:53 AM, Yan Qi wrote:
Hi Ryan,When I set both read schema and request schema to be the one with 4 fields only (i.e., a subset of the file schema, Profile.getClassSchema()), I had the following error though, 14/12/04 11:48:01 INFO mapred.JobClient: Task Id : attempt_201410141621_22583_m_000000_1, Status : FAILED parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file hdfs://had.ca:9000/tmp/avro/2014_10_14/part-00000 at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.Cl attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1000001 records. attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 338 ms. row count = 603147 attempt_201410141621_22583_m_000000_2: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". attempt_201410141621_22583_m_000000_2: SLF4J: Defaulting to no-operation (NOP) logger implementation attempt_201410141621_22583_m_000000_2: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. I am wondering if I set the schema correctly. Can you give me some suggestions? Thanks, Yan
Can you send the full log from the task that failed? It looks like it was cut off because you only get the first part in the `hadoop` command output.
Without all the information, I'm gussing that "java.lang.Cl" is a ClassCastException. That would happen if your read schema doesn't have the necessary java-class properties that cause Avro to instantiate your specific object rather than a GenericData.Record object.
I recommend taking the schema you are using for the read schema and building a Specific object for it. Then you can use that stripped-down specific object as you were before (call it PartialProfile or something) to avoid this issue.
-- Ryan Blue Software Engineer Cloudera, Inc.
