bvaradar commented on a change in pull request #1559:
URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r416891281
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -145,23 +146,37 @@ public MessageType getDataSchema() throws Exception {
* @return Avro schema for this table
* @throws Exception
*/
- public Schema getTableSchema() throws Exception {
- return convertParquetSchemaToAvro(getDataSchema());
+ public Schema getTableSchemaInAvroFormat() throws Exception {
+ Option<Schema> schemaFromCommitMetadata =
getTableSchemaFromCommitMetadata();
+ return schemaFromCommitMetadata.isPresent() ?
schemaFromCommitMetadata.get() :
+ convertParquetSchemaToAvro(getDataSchema());
+ }
+
+ /**
+ * Gets the schema for a hoodie table in Parquet format.
+ *
+ * @return Parquet schema for the table
+ * @throws Exception
+ */
+ public MessageType getTableSchemaInParquetFormat() throws Exception {
+ Option<Schema> schemaFromCommitMetadata =
getTableSchemaFromCommitMetadata();
+ return schemaFromCommitMetadata.isPresent() ?
convertAvroSchemaToParquet(schemaFromCommitMetadata.get()) :
+ getDataSchema();
}
/**
* Gets the schema for a hoodie table in Avro format from the
HoodieCommitMetadata of the last commit.
*
* @return Avro schema for this table
- * @throws Exception
*/
- public Schema getTableSchemaFromCommitMetadata() throws Exception {
+ private Option<Schema> getTableSchemaFromCommitMetadata() {
try {
HoodieTimeline timeline =
metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants();
byte[] data =
timeline.getInstantDetails(timeline.lastInstant().get()).get();
HoodieCommitMetadata metadata = HoodieCommitMetadata.fromBytes(data,
HoodieCommitMetadata.class);
String existingSchemaStr =
metadata.getMetadata(HoodieCommitMetadata.SCHEMA_KEY);
- return new Schema.Parser().parse(existingSchemaStr);
+ return StringUtils.isNullOrEmpty(existingSchemaStr) ? Option.empty() :
Review comment:
On a related note : As we are start to rely on avro schema to be
present in commit-metadata, we should store avro-schema as first-level entity
in commit metadata instead of storing it in extra-metadata map and handle
upgrade-downgrade (Added https://jira.apache.org/jira/browse/HUDI-844)
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -145,23 +146,37 @@ public MessageType getDataSchema() throws Exception {
* @return Avro schema for this table
* @throws Exception
*/
- public Schema getTableSchema() throws Exception {
- return convertParquetSchemaToAvro(getDataSchema());
+ public Schema getTableSchemaInAvroFormat() throws Exception {
+ Option<Schema> schemaFromCommitMetadata =
getTableSchemaFromCommitMetadata();
+ return schemaFromCommitMetadata.isPresent() ?
schemaFromCommitMetadata.get() :
+ convertParquetSchemaToAvro(getDataSchema());
+ }
+
+ /**
+ * Gets the schema for a hoodie table in Parquet format.
+ *
+ * @return Parquet schema for the table
+ * @throws Exception
+ */
+ public MessageType getTableSchemaInParquetFormat() throws Exception {
Review comment:
You can introduce a getTableAvroSchemaFromDataFile to return in avro
format.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -178,6 +193,17 @@ public Schema convertParquetSchemaToAvro(MessageType
parquetSchema) {
return avroSchemaConverter.convert(parquetSchema);
}
+ /**
+ * Convert a avro scheme to the parquet format.
+ *
+ * @param schema The avro schema to convert
+ * @return The converted parquet schema
+ */
+ public MessageType convertAvroSchemaToParquet(Schema schema) {
Review comment:
Please check ParquetUtils class for similar APIs
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]