Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r187116998
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
---
@@ -151,6 +154,33 @@ public CarbonTable
getOrCreateCarbonTable(Configuration configuration) throws IO
SegmentStatusManager segmentStatusManager = new
SegmentStatusManager(identifier);
SegmentStatusManager.ValidAndInvalidSegmentsInfo segments =
segmentStatusManager
.getValidAndInvalidSegments(loadMetadataDetails,
this.readCommittedScope);
+
+ // For NonTransactional table, compare the schema of all index files
with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of
same schema.
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ SchemaConverter schemaConverter = new
ThriftWrapperSchemaConverterImpl();
+ for (Segment segment : segments.getValidSegments()) {
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry :
indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo =
CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo =
schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList,
tableColumnList)) {
+ throw new IOException("All the files schema doesn't match. "
--- End diff --
@kunal642 @sounakr I agree with @gvramana, skipping data file is not
correct as it will miss some records which will not be acceptable. Blocking
user while writing is not possible. I think throwing exception is correct.
@ajantha-bhat Can u please check how Parquet works in similar scenario.
---