[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

sounakr Wed, 09 May 2018 05:57:46 -0700

Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187030343
  
    --- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
    @@ -151,6 +154,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new 
SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = 
segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, 
this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files 
with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of 
same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : 
indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, 
tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 I agree with you. The purpose of SDK is to read whatever file is 
present. In case there is a mismatch in the schema we should not block the 
output of the files are having correct schema. 
    Also, in future we are going to support Merge Schema and show the output in 
case of different schema.  
    Better to show the output of how much can be read with the correct schema 
and also throw a warning or print the log for the presence of different schema 
in the log.

---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Reply via email to