Make sure you're setting the Parquet read schema. On Thu, Apr 27, 2017 at 10:35 AM, Shannon Carey <[email protected]> wrote:
> As far as I can tell, that works for the top-level class, but not for > others. When Avro's org.apache.avro.specific.SpecificData#getClass(Schema) > attempts to look up the Java class for a "record" field of the top-level > class: > > c = ClassUtils.forName(getClassLoader(), getClassName(schema)); > > > it doesn't find it, because the class name from the Parquet schema doesn't > match the name of the Java class. As a result, instead of an instance of my > Java class being passed to the put(int field$, Object value$) of the > generated avro SpecificRecord subclass, a GenericData$Record is passed. > Then, a ClassCastException is thrown when the value$ is cast to my Java > class. > > -Shannon > > > > On 4/27/17, 11:39 AM, "Ryan Blue" <[email protected]> wrote: > > >Shannon, you can edit the Avro schema and add those namespaces. Then you > >set that as your read schema for Parquet and it will correctly read the > >data. The Avro schemas don't have to match, they just have to be > compatible. > > > >rb > > > >On Thu, Apr 27, 2017 at 9:37 AM, Shannon Carey <[email protected]> > wrote: > > > >> I'm not sure whether I should be asking Parquet people or Avro people > >> about this. > >> > >> I'm reading a Parquet file via Avro. The Parquet file was produced by > >> Spark. The Avro schema that I generated from the file (by deserializing > it > >> as a GenericData record & retrieving its schema) uses "record" types > that > >> have no "namespace" value. Therefore, when generating Java classes from > the > >> Avro schema in order to deserialize the Parquet file to strongly typed > >> objects, the generated Java classes are created in the default package. > As > >> you may know, it's basically impossible to interact with Java classes > that > >> are defined in the unnamed package. > >> > >> Has anyone else run into this situation? And is there any way to work > >> around it? It seems like I should be able to specify how the types in > the > >> Parquet file should map to a Avro namespace/package name⦠not only for > >> preventing classes in the unnamed package but also for avoiding class > name > >> conflicts. > >> > >> Thanks! > >> > > > > > > > >-- > >Ryan Blue > >Software Engineer > >Netflix > -- Ryan Blue Software Engineer Netflix
