Hi :

I want to ask question about 'avro.schema.url'.  I have a partitioned table 
with huge number of partitions like following





CREATE TABLE episodes_partitioned

PARTITIONED BY (doctor_pt INT)

ROW FORMAT

SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

STORED AS

INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///user/YOURUSER/examples/schema/twitter.avsc'
);






   I found that several methods will call 
AvroSerdeUtils.determineSchemaOrThrowException, if defined “'avro.schema.url', 
it will call getSchemaFromFS to get schema which causes huge rpc call because 
for every partition it will call  getSchemaFromFS.  So my question is  is there 
any better way to avoid this except defining avro.schema.literal in create 
table sql.


Method calls AvroSerdeUtils.determineSchemaOrThrowException:

 at 
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:109)
                 at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:191)
                 at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:110)
                 at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
                 at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:540)
                 at 
org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:184)
                 at 
org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295)
                 at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:423)
                 at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)
                 at 
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
                 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                 a

AvroSerdeUtils#determineSchemaOrThrowException:


  public static Schema determineSchemaOrThrowException(Configuration conf, 
Properties properties)
        throws IOException, AvroSerdeException {
  …..

  try {
    Schema s = getSchemaFromFS(schemaString, conf);  // if define 
avro.schema.url, need to get SchemaFrom hdfs
    if (s == null) {
      //in case schema is not a file system
      return AvroSerdeUtils.getSchemaFor(new URL(schemaString));
    }
    return s;
  } catch (IOException ioe) {
    throw new AvroSerdeException("Unable to read schema from given path: " + 
schemaString, ioe);
  } catch (URISyntaxException urie) {
    throw new AvroSerdeException("Unable to read schema from given path: " + 
schemaString, urie);
  }



…..
}



Can anyone can help view the avro schema problem, thanks!

Best Regards
ZhangLiyun/Kelly Zhang

Reply via email to