[GitHub] carbondata pull request #2032: [CARBONDATA-2224] External File level reader ...

sounakr Mon, 05 Mar 2018 07:52:41 -0800

GitHub user sounakr opened a pull request:

    https://github.com/apache/carbondata/pull/2032


    [CARBONDATA-2224] External File level reader support

    File level reader reads any carbondata file placed in any external file 
path. The reading can be done through 3 methods. 
    a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used 
in this case to read the file. To create a spark datasource external table 
    " CREATE TABLE sdkOutputTable **USING CarbonDataFileFormat** LOCATION 
'$writerOutputFilePath1'"  
    For more details please refer the test file 
org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala
    file. 
    
    b) Reading from spark sql as a external table. CarbonFileinputFormat.java 
is used for reading the files. The create table syntax for this will be 
    "CREATE EXTERNAL TABLE sdkOutputTable **STORED BY 'carbondatafileformat'** 
LOCATION '$writerOutputFilePath6'"
    For more details 
org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala.
    
    c) Reading Through Hadoop Map reduce job. Please refer 
org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more 
details. 
    
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sounakr/incubator-carbondata file_level_reader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2032
    
----
commit 65ce23b1f6e35c3c6722c7f0c14c19b7c8536d23
Author: Jacky Li <jacky.likun@...>
Date:   2018-01-06T12:28:44Z

    [CARBONDATA-1992] Remove partitionId in CarbonTablePath
    
    In CarbonTablePath, there is a deprecated partition id which is always 0, 
it should be removed to avoid confusion.
    
    This closes #1765

commit c9ceaaae66574c98a13cc65bc3b91ab8346a456b
Author: Jacky Li <jacky.likun@...>
Date:   2018-01-30T13:24:04Z

    [CARBONDATA-2099] Refactor query scan process to improve readability
    
    Unified concepts in scan process flow:
    
    1.QueryModel contains all parameter for scan, it is created by API in 
CarbonTable. (In future, CarbonTable will be the entry point for various table 
operations)
    2.Use term ColumnChunk to represent one column in one blocklet, and use 
ChunkIndex in reader to read specified column chunk
    3.Use term ColumnPage to represent one page in one ColumnChunk
    4.QueryColumn => ProjectionColumn, indicating it is for projection
    
    This closes #1874

commit 01fcd539af815956975eb4ea480f14e4bb1a2062
Author: ravipesala <ravi.pesala@...>
Date:   2017-11-15T14:18:40Z

    [CARBONDATA-1544][Datamap] Datamap FineGrain implementation
    
    Implemented interfaces for FG datamap and integrated to filterscanner to 
use the pruned bitset from FG datamap.
    FG Query flow as follows.
    1.The user can add FG datamap to any table and implement there interfaces.
    2. Any filter query which hits the table with datamap will call prune 
method of FGdatamap.
    3. The prune method of FGDatamap return list FineGrainBlocklet , these 
blocklets contain the information of block, blocklet, page and rowids 
information as well.
    4. The pruned blocklets are internally wriitten to file and returns only 
the block , blocklet and filepath information as part of Splits.
    5. Based on the splits scanrdd schedule the tasks.
    6. In filterscanner we check the datamapwriterpath from split and reNoteads 
the bitset if exists. And pass this bitset as input to it.
    
    This closes #1471

commit da82cdbda4f45fa741f56594e23c61a575c2fd2c
Author: Jacky Li <jacky.likun@...>
Date:   2018-02-27T00:51:25Z

    [REBASE] resolve conflict after rebasing to master

commit 072c95a6770a2b847e111f3349df271bade62675
Author: Jacky Li <jacky.likun@...>
Date:   2018-02-10T02:34:59Z

    Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in data 
loading"
    
    This reverts commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0.

commit 50af4d91ca2415d12e559b6070f72bfe5a881641
Author: Jacky Li <jacky.likun@...>
Date:   2018-02-11T13:37:04Z

    [CARBONDATA-2159] Remove carbon-spark dependency in store-sdk module
    
    To make assembling JAR of store-sdk module, it should not depend on 
carbon-spark module
    
    This closes #1970

commit e77fcac978a87d9d526ea7012954fc8e48e9e34c
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-08T06:42:39Z

    [CARBONDATA-2023][DataLoad] Add size base block allocation in data loading
    
    Carbondata assign blocks to nodes at the beginning of data loading.
    Previous block allocation strategy is block number based and it will
    suffer skewed data problem if the size of input files differs a lot.
    
    We introduced a size based block allocation strategy to optimize data
    loading performance in skewed data scenario.
    
    This closes #1808

commit 00e5208a6da5cc13aabd3ed6c437d2d1c5fa06ff
Author: sounakr <sounakr@...>
Date:   2017-09-28T10:51:05Z

    [CARBONDATA-1480]Min Max Index Example for DataMap
    
    Datamap Example. Implementation of Min Max Index through Datamap. And Using 
the Index while prunning.
    
    This closes #1359

commit 3212c0c025191c754c454ad88de3adbec26dc58b
Author: ravipesala <ravi.pesala@...>
Date:   2017-11-15T14:18:40Z

    [CARBONDATA-1544][Datamap] Datamap FineGrain implementation
    
    Implemented interfaces for FG datamap and integrated to filterscanner to 
use the pruned bitset from FG datamap.
    FG Query flow as follows.
    1.The user can add FG datamap to any table and implement there interfaces.
    2. Any filter query which hits the table with datamap will call prune 
method of FGdatamap.
    3. The prune method of FGDatamap return list FineGrainBlocklet , these 
blocklets contain the information of block, blocklet, page and rowids 
information as well.
    4. The pruned blocklets are internally wriitten to file and returns only 
the block , blocklet and filepath information as part of Splits.
    5. Based on the splits scanrdd schedule the tasks.
    6. In filterscanner we check the datamapwriterpath from split and reNoteads 
the bitset if exists. And pass this bitset as input to it.
    
    This closes #1471

commit aa3f2ff731fa6e0004dea827417c0d932d4a6291
Author: Jacky Li <jacky.likun@...>
Date:   2018-01-06T12:28:44Z

    [CARBONDATA-1992] Remove partitionId in CarbonTablePath
    
    In CarbonTablePath, there is a deprecated partition id which is always 0, 
it should be removed to avoid confusion.
    
    This closes #1765

commit 3ba31a162dc66bc5ee9023c7ff466c7de4c31c50
Author: Jacky Li <jacky.likun@...>
Date:   2018-01-30T13:24:04Z

    [CARBONDATA-2099] Refactor query scan process to improve readability
    
    Unified concepts in scan process flow:
    
    1.QueryModel contains all parameter for scan, it is created by API in 
CarbonTable. (In future, CarbonTable will be the entry point for various table 
operations)
    2.Use term ColumnChunk to represent one column in one blocklet, and use 
ChunkIndex in reader to read specified column chunk
    3.Use term ColumnPage to represent one page in one ColumnChunk
    4.QueryColumn => ProjectionColumn, indicating it is for projection
    
    This closes #1874

commit 810f093c28dc9e8a70a04bef1bc701569ec4261e
Author: Jacky Li <jacky.likun@...>
Date:   2018-01-31T08:14:27Z

    [CARBONDATA-2025] Unify all path construction through CarbonTablePath 
static method
    
    Refactory CarbonTablePath:
    
    1.Remove CarbonStorePath and use CarbonTablePath only.
    2.Make CarbonTablePath an utility without object creation, it can avoid 
creating object before using it, thus code is cleaner and GC is less.
    
    This closes #1768

commit 5a91a4cf49e3554f95f88637d93b51c80bf5329f
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-08T06:42:39Z

    [CARBONDATA-2023][DataLoad] Add size base block allocation in data loading
    
    Carbondata assign blocks to nodes at the beginning of data loading.
    Previous block allocation strategy is block number based and it will
    suffer skewed data problem if the size of input files differs a lot.
    
    We introduced a size based block allocation strategy to optimize data
    loading performance in skewed data scenario.
    
    This closes #1808

commit 667303e7dfa515cda7cd3e34c736b74b5e246c29
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-08T07:39:45Z

    [HotFix][CheckStyle] Fix import related checkstyle
    
    This closes #1952

commit 442350f6cbc908ea02ec6ef5f8d5b748b63d73d9
Author: Jacky Li <jacky.likun@...>
Date:   2018-02-27T03:26:30Z

    [REBASE] Solve conflict after merging master

commit ea51dbf0d0d03d5cf9a946594cec61e4d9a2a46d
Author: Jacky Li <jacky.likun@...>
Date:   2018-02-10T02:34:59Z

    Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in data 
loading"
    
    This reverts commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0.

commit d13f01bfb7bf84fd8a231300219cbc4818eabe5b
Author: sounakr <sounakr@...>
Date:   2018-02-24T02:25:14Z

    File Format Reader

commit 06b0c74edbc6097ada28382f27c54905a1b07159
Author: sounakr <sounakr@...>
Date:   2018-02-26T11:58:47Z

    File Format Phase 2

commit 372b380470600c03a2f723b53a106a5ce0087ae9
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T06:06:56Z

    * File Format Phase 2 (cleanup code)

commit 8eb20a5dd9543029239a051bd978e855a69d805c
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T06:36:28Z

    * File Format Phase 2 (cleanup code)

commit 462fd28cbc1268bbb529f947ee2e93c068e0d682
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T09:54:43Z

    * File Format Phase 2 (cleanup code and adding testCase)

commit 952688b8cf1b17954b85af6143abcab77d081da8
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T11:58:37Z

    * File Format Phase 2 (filter issue fix)

commit 87c84943122c8523291cc25751829ac143161469
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T12:20:46Z

    * File Format Phase 2 (filter issue fix return value)

commit 3a0c3b9448c3cca0742db0f557518ffa12d0dabb
Author: sounakr <sounakr@...>
Date:   2018-02-27T13:55:16Z

    Clear DataMap Cache

commit 1943cf6dcd266cd78483f137e0499083d95e4332
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-27T14:02:35Z

    * File Format Phase 2 (test cases)

commit 4f97c7e35fade5fe0abb58b0c781a6b7f5b744e9
Author: sounakr <sounakr@...>
Date:   2018-02-28T03:18:45Z

    Refactor CarbonFileInputFormat

commit 7df78cf50b658cc6fb79e28b0ad76f74dc8a680a
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit 4825fcc8d023c2b1a031ee0417addf5b6f2d5763
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit 5e5adbe21b8b786c13fda13e7e052bc5e46f22b4
Author: Ajantha-Bhat <ajanthabhat@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit b510faa9e033fb2ca0ae64125aee10709201e69f
Author: sounakr <sounakr@...>
Date:   2018-03-01T11:23:39Z

    Map Reduce Test Case for CarbonInputFileFormat

----


---

[GitHub] carbondata pull request #2032: [CARBONDATA-2224] External File level reader ...

Reply via email to