[
https://issues.apache.org/jira/browse/SQOOP-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qian Xu updated SQOOP-1395:
---------------------------
Description:
If you import a table "users". Sqoop will generate an entity class named
"users.java". The class will be compiled, submitted and used by a mapreduce
job. If the target file format is Avro or Parquet, an Avro schema will be
generated as well. According to Avro specification, the entity class is
described as "record", the name of the "record" is "users".
For Parquet file format handling, we use the Kite SDK to manage Parquet file
reading and writing with minimal efforts. Kite requires an Avro schema and all
data records to be packed into GenericRecord instances. There will be a problem
here. Kite will read the schema first and try to instantiate a record regarding
its name. In this case, Kite will try to instantiate a "users" class.
Unfortunately, there is a "users.java" out there. This will cause mapreduce job
fail.
The patch proposes to change the {{AvroSchemaGenerator}} class. Record name
will have a prefix. In this example, the record name of "users.java" will be
changed to "sqoop_import_users".
was:
If you import a table "users". Sqoop will generate an entity class named
"users.java". The class will be compiled, submitted and used by a mapreduce
job. If the target file format is Avro or Parquet, an Avro schema will be
generated as well. According to Avro specification, the entity class is
described as "record", the name of the "record" is "users".
For Parquet file format handling, we use the Kite SDK to manage Parquet file
reading and writing with minimal efforts. Kite requires an Avro schema and all
data records to be packed into GenericRecord instances. There will be a problem
here. Kite will read the schema first and try to instantiate a record regarding
its name. In this case, Kite will try to instantiate a "users" class.
Unfortunately, there is a "users.java" out there. This will cause mapreduce job
fail.
In order to solve this problem, I intend to keep the name of the entity class
and the Avro record different.
The patch will:
Change the record name in Avro schema.
Remove the SqoopAvroRecord, as it is no longer required. (ClassWriter.java is
reverted to previous state)
> Potential naming conflict in Avro schema
> ----------------------------------------
>
> Key: SQOOP-1395
> URL: https://issues.apache.org/jira/browse/SQOOP-1395
> Project: Sqoop
> Issue Type: Sub-task
> Components: tools
> Reporter: Qian Xu
> Assignee: Qian Xu
> Priority: Minor
>
> If you import a table "users". Sqoop will generate an entity class named
> "users.java". The class will be compiled, submitted and used by a mapreduce
> job. If the target file format is Avro or Parquet, an Avro schema will be
> generated as well. According to Avro specification, the entity class is
> described as "record", the name of the "record" is "users".
> For Parquet file format handling, we use the Kite SDK to manage Parquet file
> reading and writing with minimal efforts. Kite requires an Avro schema and
> all data records to be packed into GenericRecord instances. There will be a
> problem here. Kite will read the schema first and try to instantiate a record
> regarding its name. In this case, Kite will try to instantiate a "users"
> class. Unfortunately, there is a "users.java" out there. This will cause
> mapreduce job fail.
> The patch proposes to change the {{AvroSchemaGenerator}} class. Record name
> will have a prefix. In this example, the record name of "users.java" will be
> changed to "sqoop_import_users".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)