Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187071062
  
    --- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
    @@ -151,6 +154,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new 
SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = 
segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, 
this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files 
with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of 
same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : 
indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, 
tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 , @sounakr. Data files should not be skipped, clear error should 
be given to user. Otherwise user thinks that result is correct and is computed 
considering all files. Along with exception, which file has data mismatch also 
needs to be logged for him to analyse further and fix.
    later carbon print tool will be provided for him to check schema of each 
carbondata file, which will help user to debug problem.


---

Reply via email to