GitHub user bhavya411 opened a pull request:
https://github.com/apache/carbondata/pull/1581
[CARBONDATA-1779] GenericVectorizedReader
This PR removes the Spark Dependency from Presto Integration Module for
using the CarbonVectorizedRecordreader, This PR consolidate
CarbonVectorizedRecordReader into one,to make it shared for all integration
modules
In the earlier version of Presto Integration we were using ColumnarBatch of
Spark, which is not a good practice, here we provided our own implementation of
the ColumnVector and the VectorBatch to eliminate the Spark all together. This
generic ColumnVector can now be used for all the integration module wherever we
want to have a VectorizedReader to speed up the processing.
There are some core module classes changed to ensure that we are using Java
data types instead of Spark datatypes, Decimal being one of them.
This PR tries to limit the changes to Core module .
Newly Added Classes
1.CarbonColumnVectorImpl:This Class Implements the Interface
CarbonColumnVector and provides the methods to store the data in a Vector and
to retrieved the data from it as well
2.CarbonVectorBatch: This Class Creates A VectorizedRowBatch which is a set
of rows, organized with each column as a CarbonVector. It is the unit of query
execution, organized to minimize the cost per row and achieve high
cycles-per-instruction. The major fields are public by design to allow fast and
convenient access by the vectorized query execution code.
3.StructField:This class is used to pass the Schema Information to the
Carbon Columnar Batch
No interfaces changed.
No backward compatibility impacted.
No Document update required.
[ Yes] Testing done
- All Unit test cases are passing, no new unit test cases were needed
as this PR implements a Generic Vectorized Reader.
- Manual Testing completed for the same .
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bhavya411/incubator-carbondata
GenericPrestoVectorizedReader
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1581.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1581
----
commit 943dd6d01063409788d988bbd766c8597dc7de21
Author: Bhavya <[email protected]>
Date: 2017-11-14T10:05:44Z
Added Generic vectorized Reader
----
---