[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

ravipesala Thu, 01 Nov 2018 18:55:22 -0700

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2869#discussion_r230251542
  
    --- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
    @@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration 
configuration) throws IO
         }
       }
     
    +  /**
    +   * This method will list all the carbondata files in the table path and 
treat one carbondata
    +   * file as one split.
    +   */
    +  public List<InputSplit> getAllFileSplits(JobContext job) throws 
IOException {
    +    List<InputSplit> splits = new ArrayList<>();
    +    CarbonTable carbonTable = 
getOrCreateCarbonTable(job.getConfiguration());
    +    if (null == carbonTable) {
    +      throw new IOException("Missing/Corrupt schema file for table.");
    +    }
    +    for (CarbonFile carbonFile : 
getAllCarbonDataFiles(carbonTable.getTablePath())) {
    +      CarbonInputSplit split =
    +          new CarbonInputSplit("null", new 
Path(carbonFile.getAbsolutePath()), 0,
    +              carbonFile.getLength(), carbonFile.getLocations(), 
FileFormat.COLUMNAR_V3);
    +      split.setVersion(ColumnarFormatVersion.V3);
    +      BlockletDetailInfo info = new BlockletDetailInfo();
    +      split.setDetailInfo(info);
    +      info.setBlockSize(carbonFile.getLength());
    +      // Read the footer offset and set.
    +      FileReader reader = FileFactory
    --- End diff --
    
    Reading of filefooter offset should not be inside getsplits as it will 
increase the getSplits time if files are more. it should be handled it during 
the record reader initialization time in executor side or inside the thread.

---

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

Reply via email to