Github user sounakr commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r187030343
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
---
@@ -151,6 +154,33 @@ public CarbonTable
getOrCreateCarbonTable(Configuration configuration) throws IO
SegmentStatusManager segmentStatusManager = new
SegmentStatusManager(identifier);
SegmentStatusManager.ValidAndInvalidSegmentsInfo segments =
segmentStatusManager
.getValidAndInvalidSegments(loadMetadataDetails,
this.readCommittedScope);
+
+ // For NonTransactional table, compare the schema of all index files
with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of
same schema.
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ SchemaConverter schemaConverter = new
ThriftWrapperSchemaConverterImpl();
+ for (Segment segment : segments.getValidSegments()) {
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry :
indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo =
CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo =
schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList,
tableColumnList)) {
+ throw new IOException("All the files schema doesn't match. "
--- End diff --
@kunal642 I agree with you. The purpose of SDK is to read whatever file is
present. In case there is a mismatch in the schema we should not block the
output of the files are having correct schema.
Also, in future we are going to support Merge Schema and show the output in
case of different schema.
Better to show the output of how much can be read with the correct schema
and also throw a warning or print the log for the presence of different schema
in the log.
---