[
https://issues.apache.org/jira/browse/GOBBLIN-1485?focusedWorklogId=615891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615891
]
ASF GitHub Bot logged work on GOBBLIN-1485:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 28/Jun/21 22:34
Start Date: 28/Jun/21 22:34
Worklog Time Spent: 10m
Work Description: ZihanLi58 commented on a change in pull request #3324:
URL: https://github.com/apache/gobblin/pull/3324#discussion_r660158526
##########
File path:
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/orc/HiveOrcSerDeManager.java
##########
@@ -264,7 +272,18 @@ private void addSchemaProperties(Path path,
HiveRegistrationUnit hiveUnit)
*
*/
protected void addSchemaPropertiesHelper(Path path, HiveRegistrationUnit
hiveUnit) throws IOException {
- TypeInfo schema = getSchemaFromLatestFile(path, this.fs);
+ TypeInfo schema;
+ if(props.getPropAsBoolean(HIVE_SPEC_SCHEMA_FROM_WRITER,
DEFAULT_HIVE_SPEC_SCHEMA_FROM_WRITER)) {
+ try {
+ Preconditions.checkArgument(props.contains(WRITER_LATEST_SCHEMA));
+ Schema avroSchema = new
Schema.Parser().parse(props.getProp(WRITER_LATEST_SCHEMA));
+ schema = TypeInfoUtils.getTypeInfoFromObjectInspector(new
AvroObjectInspectorGenerator(avroSchema).getObjectInspector());
Review comment:
Yeah I was trying to do that. Several reasons here:
1. AvroOrcSchemaConverter is now defined in gobblin-orc module, I don't
think it make sense for us to introduce new dependency for hive registration
module.
2. It's doable to transfer TypeDescription to TypeInfo, but it's the same
way that we need to use OrcUtils to create one objectInspector and get typeInfo
there. As we are using the writer schema to get orcSchema, I think the two
results should be the same? I verified one table and it looks good to me.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 615891)
Time Spent: 1h 10m (was: 1h)
> Enable feature to get schema from writer schema when do hive registration
> -------------------------------------------------------------------------
>
> Key: GOBBLIN-1485
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1485
> Project: Apache Gobblin
> Issue Type: New Feature
> Reporter: Zihan Li
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Enable feature to get schema from writer schema when do hive registration, so
> that we can avoid list operations to get the latest schema
--
This message was sent by Atlassian Jira
(v8.3.4#803005)