Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r187071062
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
---
@@ -151,6 +154,33 @@ public CarbonTable
getOrCreateCarbonTable(Configuration configuration) throws IO
SegmentStatusManager segmentStatusManager = new
SegmentStatusManager(identifier);
SegmentStatusManager.ValidAndInvalidSegmentsInfo segments =
segmentStatusManager
.getValidAndInvalidSegments(loadMetadataDetails,
this.readCommittedScope);
+
+ // For NonTransactional table, compare the schema of all index files
with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of
same schema.
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ SchemaConverter schemaConverter = new
ThriftWrapperSchemaConverterImpl();
+ for (Segment segment : segments.getValidSegments()) {
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry :
indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo =
CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo =
schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList,
tableColumnList)) {
+ throw new IOException("All the files schema doesn't match. "
--- End diff --
@kunal642 , @sounakr. Data files should not be skipped, clear error should
be given to user. Otherwise user thinks that result is correct and is computed
considering all files. Along with exception, which file has data mismatch also
needs to be logged for him to analyse further and fix.
later carbon print tool will be provided for him to check schema of each
carbondata file, which will help user to debug problem.
---