GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/1967
[CARBONDATA-1997] Add CarbonWriter API
Added a CarbonWriter API, it can be used to write Carbondata files to a
specified folder
1. Added a store-sdk module, including a CarbonWriter API to write carbon
files to specified table location. This writer still use current table folder
structure.
2. Added Strings utility class for mkString functionality
3. Added TableSchemaBuilder and CarbonTableBuilder in core module.
- [X] Any interfaces changed?
Public user API added
- [X] Any backward compatibility impacted?
No
- [X] Document update required?
Yes
- [X] Testing done
UT added
- [X] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata sdk
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1967.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1967
----
commit 952665a8c1c52f28951463fef989333ae0e6d83e
Author: Jacky Li <jacky.likun@...>
Date: 2018-01-06T12:28:44Z
[CARBONDATA-1992] Remove partitionId in CarbonTablePath
In CarbonTablePath, there is a deprecated partition id which is always 0,
it should be removed to avoid confusion.
This closes #1765
commit 111c3821557820241d1114d87eae2f7cd017e610
Author: Jacky Li <jacky.likun@...>
Date: 2018-01-02T15:46:14Z
[CARBONDATA-1968] Add external table support
This PR adds support for creating external table with existing carbondata
files, using Hive syntax.
CREATE EXTERNAL TABLE tableName STORED BY 'carbondata' LOCATION 'path'
This closes #1749
commit 80b42ac662ebd2bc243ca91c86b035717223daf4
Author: SangeetaGulia <sangeeta.gulia@...>
Date: 2017-09-21T09:26:26Z
[CARBONDATA-1827] S3 Carbon Implementation
1.Provide support for s3 in carbondata.
2.Added S3Example to create carbon table on s3.
3.Added S3CSVExample to load carbon table using csv from s3.
This closes #1805
commit 71c2d8ca4a3212cff1eedbe78ee03e521f57fbbc
Author: Jacky Li <jacky.likun@...>
Date: 2018-01-31T16:25:31Z
[REBASE] Solve conflict after rebasing master
commit 15b4e192ee904a2e7c845ac67e0fcf1ba151a683
Author: Jacky Li <jacky.likun@...>
Date: 2018-01-30T13:24:04Z
[CARBONDATA-2099] Refactor query scan process to improve readability
Unified concepts in scan process flow:
1.QueryModel contains all parameter for scan, it is created by API in
CarbonTable. (In future, CarbonTable will be the entry point for various table
operations)
2.Use term ColumnChunk to represent one column in one blocklet, and use
ChunkIndex in reader to read specified column chunk
3.Use term ColumnPage to represent one page in one ColumnChunk
4.QueryColumn => ProjectionColumn, indicating it is for projection
This closes #1874
commit c3e99681bcd397ed33bc90e8d73b1fd33e0e60f7
Author: Jacky Li <jacky.likun@...>
Date: 2018-01-31T08:14:27Z
[CARBONDATA-2025] Unify all path construction through CarbonTablePath
static method
Refactory CarbonTablePath:
1.Remove CarbonStorePath and use CarbonTablePath only.
2.Make CarbonTablePath an utility without object creation, it can avoid
creating object before using it, thus code is cleaner and GC is less.
This closes #1768
commit e502c59a2d0b95d80db3aff04c749654254eadbe
Author: Jatin <jatin.demla@...>
Date: 2018-01-25T11:23:00Z
[CARBONDATA-2080] [S3-Implementation] Propagated hadoopConf from driver to
executor for s3 implementation in cluster mode.
Problem : hadoopconf was not getting propagated from driver to the executor
that's why load was failing to the distributed environment.
Solution: Setting the Hadoop conf in base class CarbonRDD
How to verify this PR :
Execute the load in the cluster mode It should be a success using location
s3.
This closes #1860
commit cae74a8cecea74e8899a87dcb7d12e0dec1b8069
Author: sounakr <sounakr@...>
Date: 2017-09-28T10:51:05Z
[CARBONDATA-1480]Min Max Index Example for DataMap
Datamap Example. Implementation of Min Max Index through Datamap. And Using
the Index while prunning.
This closes #1359
commit e972fd3d5cc8f392d47ca111b2d8f262edb29ac6
Author: ravipesala <ravi.pesala@...>
Date: 2017-11-15T14:18:40Z
[CARBONDATA-1544][Datamap] Datamap FineGrain implementation
Implemented interfaces for FG datamap and integrated to filterscanner to
use the pruned bitset from FG datamap.
FG Query flow as follows.
1.The user can add FG datamap to any table and implement there interfaces.
2. Any filter query which hits the table with datamap will call prune
method of FGdatamap.
3. The prune method of FGDatamap return list FineGrainBlocklet , these
blocklets contain the information of block, blocklet, page and rowids
information as well.
4. The pruned blocklets are internally wriitten to file and returns only
the block , blocklet and filepath information as part of Splits.
5. Based on the splits scanrdd schedule the tasks.
6. In filterscanner we check the datamapwriterpath from split and reNoteads
the bitset if exists. And pass this bitset as input to it.
This closes #1471
commit 6ea310dd745277b6c8f3acef296fcf8f380ea296
Author: Jacky Li <jacky.likun@...>
Date: 2018-02-10T10:21:18Z
add sdk api
----
---