[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

kunal642 Wed, 09 May 2018 03:03:22 -0700

Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186992244
  
    --- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
    @@ -151,6 +154,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new 
SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = 
segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, 
this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files 
with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of 
same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : 
indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, 
tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    I dont think throwing exception is the correct approach. Either we should 
not let the user write data with different schema or we should infer the schema 
from the first index file that was written and skip any index file with 
different schema while preparing splits.
    
    @sounakr @kumarvishal09 what do you think??

---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Reply via email to