Thanks, I was actually in the middle of trying that as I got your email! It fixed my issue.
And I suppose it's no more dangerous to specify my own read schema than it is to specify the desired projection, since the Parquet file doesn't come with its own original schema. From: Ryan Blue <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, April 27, 2017 at 1:12 PM To: Shannon Carey <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Avro Schema from Parquet file uses package-less classes Make sure you're setting the Parquet read schema. On Thu, Apr 27, 2017 at 10:35 AM, Shannon Carey <[email protected]<mailto:[email protected]>> wrote: As far as I can tell, that works for the top-level class, but not for others. When Avro's org.apache.avro.specific.SpecificData#getClass(Schema) attempts to look up the Java class for a "record" field of the top-level class: c = ClassUtils.forName(getClassLoader(), getClassName(schema)); it doesn't find it, because the class name from the Parquet schema doesn't match the name of the Java class. As a result, instead of an instance of my Java class being passed to the put(int field$, Object value$) of the generated avro SpecificRecord subclass, a GenericData$Record is passed. Then, a ClassCastException is thrown when the value$ is cast to my Java class. -Shannon On 4/27/17, 11:39 AM, "Ryan Blue" <[email protected]<mailto:[email protected]>> wrote: >Shannon, you can edit the Avro schema and add those namespaces. Then you >set that as your read schema for Parquet and it will correctly read the >data. The Avro schemas don't have to match, they just have to be compatible. > >rb > >On Thu, Apr 27, 2017 at 9:37 AM, Shannon Carey ><[email protected]<mailto:[email protected]>> wrote: > >> I'm not sure whether I should be asking Parquet people or Avro people >> about this. >> >> I'm reading a Parquet file via Avro. The Parquet file was produced by >> Spark. The Avro schema that I generated from the file (by deserializing it >> as a GenericData record & retrieving its schema) uses "record" types that >> have no "namespace" value. Therefore, when generating Java classes from the >> Avro schema in order to deserialize the Parquet file to strongly typed >> objects, the generated Java classes are created in the default package. As >> you may know, it's basically impossible to interact with Java classes that >> are defined in the unnamed package. >> >> Has anyone else run into this situation? And is there any way to work >> around it? It seems like I should be able to specify how the types in the >> Parquet file should map to a Avro namespace/package name⦠not only for >> preventing classes in the unnamed package but also for avoiding class name >> conflicts. >> >> Thanks! >> > > > >-- >Ryan Blue >Software Engineer >Netflix -- Ryan Blue Software Engineer Netflix
