GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/2691
[CARBONDATA-2912] Support CSV table load csv data with spark2.2
In branch-1.3, CSV table cann't load csv data with spark2.2
Carbon need upgrade commons-lang3 vision
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance
test report.
- Any additional information to help reviewers in testing this
change.
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xubo245/carbondata
CARBONDATA-2912_twoInsert1.3.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2691.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2691
----
commit c055c8f33123bfb6e1103456bea23a0ff8c944ca
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-03T20:31:00Z
[maven-release-plugin] prepare release apache-carbondata-1.3.0-rc2
commit 607b4cef646b2b9a3c2a8fc687dc40342165979a
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-03T20:31:53Z
[maven-release-plugin] prepare for next development iteration
commit 449668ad9cda869b14f31dcc2c6df6454701cddc
Author: dhatchayani <dhatcha.official@...>
Date: 2018-02-05T10:51:09Z
[CARBONDATA-2131] Alter table adding long datatype is failing but Create
table with long type is successful, in Spark 2.1
Modified code to make "Create table" supported data types and "alter add
columns" supported data types consistent
This closes #1932
commit a3b97f38412cf96ee041b6ebfbd7c39af54e391d
Author: kumarvishal <kumarvishal.1802@...>
Date: 2018-02-05T09:47:02Z
[CARBONDATA-2142] Fixed Pre-Aggregate datamap creation issue
Fixed Reverting changes issue in case of create pre-aggregate data⦠map
creation is failing
Removed look-up while creating the pre-aggregate data map
Removed unused code
This closes #1943
commit 2c5ecfbfe5ce3357d041207cad8edcf587e4115f
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-07T13:14:33Z
[CARBONDATA-2119]deserialization issue for carbonloadmodel
Problem:
Load model was not getting de-serialized in the executor due to which 2
different carbon table objects were being created.
Solution:
Reconstruct carbonTable from tableInfo if not already created.
This closes #1947
commit 8b105a1e1f6e7e7e3b0bc13d44c1bf93fd821e31
Author: m00258959 <manish.gupta@...>
Date: 2018-02-07T06:37:33Z
[CARBONDATA-2143] Fixed query memory leak issue for task failure during
initialization of record reader
Problem:
Whenever a query is executed, in the internalCompute method of
CarbonScanRdd class record reader is initialized. A task completion listener is
attached to each task after initialization of the record reader.
During record reader initialization, queryResultIterator is initialized and
one blocklet is processed. The blocklet processed will use available unsafe
memory.
Lets say there are 100 columns and 80 columns get the space but there is no
space left for the remaining columns to be stored in the unsafe memory. This
will result is memory exception and record reader initialization will fail
leading to failure in query.
In the above case the unsafe memory allocated for 80 columns will not be
freed and will always remain occupied till the JVM process persists.
Impact
It is memory leak in the system and can lead to query failures for queries
executed after one one query fails due to the above reason.
Solution:
Attach the task completion listener before record reader initialization so
that if the query fails at the very first instance after using unsafe memory,
still that memory will be cleared.
This closes #1948
commit 9f73f0e60611c52278d2d475a89d42adebf32f60
Author: m00258959 <manish.gupta@...>
Date: 2018-02-05T11:40:18Z
[CARBONDATA-2134] Prevent implicit column filter list from getting
serialized while submitting task to executor
Problem
In the current store blocklet pruning in driver and no further pruning
takes place in the executor side. But still the implicit column filter list
being sent to executor. As the size of list grows the cost of serializing and
deserializing the list is increasing which can impact the query performance.
Solution
Remove the list from the filter expression before submitting the task to
executor.
This closes #1935
commit 1137c285f55dfdc0de24bdebf81d78187df93f8a
Author: kunal642 <kunalkapoor642@...>
Date: 2018-02-08T06:20:23Z
[CARBONDATA-1763] Dropped table if exception thrown while creation
Preaggregate table is not getting dropped when creation fails because
Exceptions from undo metadata is not handled
If preaggregate table is not registered with main table(main table updation
fails) then it is not dropped from metastore.
This closes #1951
commit 6e435de5e04ace63fe5b105e2f180ef0932d80d3
Author: rahulforallp <rahul.kumar@...>
Date: 2018-02-06T13:11:35Z
[CARBONDATA-2137] Delete query performance improved
Following is the configuration used :
SPARK_EXECUTOR_MEMORY : 200G
SPARK_DRIVER_MEMORY : 20G
SPARK_EXECUTOR_CORES : 32
SPARK_EXECUTOR_INSTANCEs : 3
Earlier it was taking 20 minute now it is taking approx 5 minute
This closes #1937
commit bc3f825107517ad1e39a385c488beadd6022ab8e
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-08T17:40:43Z
[CARBONDATA-2150] Unwanted updatetable status files are being generated for
the delete operation where no records are deleted
Problem:
Unwanted updatetable status files are being generated for the delete
operation where no records are deleted
Analysis:
when the filter value for delete operation is less than the maximum value
in that column, then getsplits() will return the
block and hence in delete logic, it was creating update table status file
even though delete operation was not done and
added spark context to create database event
This closes #1957
commit 15cc7fa97722d055ad5627b3a915ee6d2b6817d6
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-14T13:37:15Z
[CARBONDATA-2182] Added one more params called extraParams in SessionParams
and add carbonSessionInfo to CarbonEnvInitPreEvent
Add one more param called ExtraParmas in SessionParams for session Level
operations and pass the carbonSessionInfo to event, so that user can
save information in that at session level in carbonSessionInfo
This closes #1978
commit 27634deee82d7a1560e75f8dfc09333eb8df51db
Author: anubhav100 <anubhav.tarar@...>
Date: 2018-02-06T08:03:39Z
[CARBONDATA-2133] Fixed Exception displays after performing select query on
newly added Boolean Type
Problem : In Restructure util and RestructureBasedVectorResultCollector to
get the default value of a measure type the case for boolean data type was
missing,and in DataTypeUtil to store default value in bytes case of boolean
data type was missing
Solution: Add the Boolean data type case
This closes #1934
commit aff3b39efd772a881590432816369a05d0cb5855
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-15T13:30:26Z
[CARBONDATA-2103] Optimize show tables for filtering datamaps
Problem
Show tables was taking more time as two times lookup was happening to
filter out the datamaps
Solution
add a hive table property which is true for all tables and false for
datamaps like preAggregate table and show tables filter out these tables
based on the property.
This closes #1980
commit 7beef112b59c9ccfe14baca87ae841cfe77e4dce
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-14T10:15:04Z
[CARBONDATA-2183] Fix compaction when segment is delete during compaction
and remove unnecessary parameters in functions
Problem:
when compaction is started and job is running, and parallelly the segment
involved in the compaction is deleted using DeleteSegmentByID, then
compaction is success.
Solution:
when compaction is started and job is running, and parallelly the segment
involved in the compaction is deleted using DeleteSegmentByID, then
compaction should be aborted and failed. and proper error message should
thrown to user. THis PR also removes the unnecessary parameters in functions.
This closes #1979
commit 39ac94e462e6571414dee8f58c174e44a79f8ad4
Author: kunal642 <kunalkapoor642@...>
Date: 2018-02-12T19:23:31Z
[CARBONDATA-2142] [CARBONDATA-1763] Fixed issues while creation concurrent
datamaps
Analysis:
1. GenerateTableSchemaString in CarbonMetastore did not have any specific
implementation for hive metastore due to which carbontables were being
cached in MetaData. As there is no way to refresh table in hivemetastore
therefore this is wrong. All queries should get the latest carbon table
from metastore and not from cache.
2. If updating the main table status fails then revertMainTableChanges
method is called to revert the changes. The logic to revert was wrong which led
to wrong entry getting deleted from the schema.
3. Moved the force remove logic before taking locks as deletion from
metastore should happen even if the lock if not present as the table is in
stale state(Entry is not there in parent but available in metastore).
This closes #1975
commit c2785b352f7b7cb2dd524811b0696fb18c12d5b0
Author: BJangir <babulaljangir111@...>
Date: 2018-02-11T19:32:30Z
[CARBONDATA-2161] update mergeTo column for compacted segment of streaming
table
This closes #1971
commit f8a62a9bd8ba39cd6bc247c587a7a3e1afd99254
Author: QiangCai <qiangcai@...>
Date: 2018-02-11T08:06:01Z
[CARBONDATA-2151][Streaming] Fix filter query issue on streaming table
1.Fix filter query issue for timestamp, date, decimal
2.Add more test case
dataType: int, streaming, float, double, decimal, timestamp, date, complex
operation: =, <>, >=, >, <, <=, in, like, between, is null, is not null
This closes #1969
commit 4bbbd4b1df444163cfb72cf74a05c1a9d09e1200
Author: BJangir <babulaljangir111@...>
Date: 2018-02-19T17:01:00Z
[CARBONDATA-2185] Add InputMetrics for Streaming Reader
This closes #1985
commit 6f9016db52dd3f9c31ba20e585debfc283e2594e
Author: Zhang Zhichao <441586683@...>
Date: 2018-02-09T09:32:54Z
[CARBONDATA-2149]Fix complex type data displaying error when use DataFrame
to write complex type data
The default value of 'complex_delimiter_level_1' and
'complex_delimiter_level_2' is wrong, it must be '$' and ':', not be '$' and
'\:'. Escape characters '\' need to be added only when using delimiters in
ArrayParserImpl or StructParserImpl
This closes #1962
commit b0a2fabcc8584dfba24ad0ea135948f5365a7335
Author: QiangCai <qiangcai@...>
Date: 2018-02-25T10:53:41Z
[CARBONDATA-2200] Fix bug of LIKE operation on streaming table
Fix bug of LIKE operation on streaming table,
LIKE operation will be converted to StartsWith / EndsWith / Contains
expression.
Carbon will use RowLevelFilterExecuterImpl to evaluate this expression.
Streaming table also should implement RowLevelFilterExecuterImpl.
This closes #1996
commit e363dd1a68e2138591a930055dd1701a1245825f
Author: rahulforallp <rahul.kumar@...>
Date: 2018-02-25T09:55:26Z
[CARBONDATA-2201] NPE fixed while triggering the LoadTablePreExecutionEvent
before Streaming
While triggering the LoadTablePreExecutionEvent we require options provided
by user and the finalOptions.
In case of streaming both are same. If we pass null . It may cause NPE.
This closes #1997
commit 0f210c86ca3ee9f0fa845cdeaef418ed9253b6f8
Author: Zhang Zhichao <441586683@...>
Date: 2018-02-04T04:54:24Z
[MINOR]Remove dependency of Java 1.8
This closes #1928
commit 758d03e783e324f70b6599be7feb1951b1034f51
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-09T04:07:02Z
[CARBONDATA-2168] Support global sort for standard hive partitioning
This closes #1972
commit 1997ca235f90b5746262c9654b685b9b6bd3f16a
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-14T19:01:56Z
[CARBONDATA-2187][PARTITION] Partition restructure for new folder structure
and supporting partition location feature
This closes #1984
commit b51d8186a82818672067dfd0387af6ff505f940c
Author: Jatin <jatin.demla@...>
Date: 2018-02-23T11:26:17Z
[CARBONDATA-2199] Fixed Dimension column after restructure getting wrong
block datatype
Problem: Changing datatype of measure having sort_columns calls for
restructure and after having restructure it changes the datatype to actual
datatype for which accessing the data with changed datatype gives exception of
incorrect length.
Solution: Store the datatype in DimensionInfo while restructuring and
access the same datatype to get the block data type.
This closes #1993
commit 7726b4f9b379b0eec4b9fff6571415f47fa55587
Author: Jatin <jatin.demla@...>
Date: 2018-02-27T10:43:40Z
[CARBONDATA-2207] Fix testcases after using hive metastore
CarbonTable was getting null in case of hivemetatore so, fetch the same
from metastore instead of carbon.
This closes #2005
commit b360f9084f873bc096d7fabfde20730fbc752350
Author: chenliang613 <chenliang613@...>
Date: 2018-02-08T17:32:38Z
[HOTFIX] Add partition usage code
This closes #1956
commit b9a6b68658fd0f7f408102374b3ef31dcfe44cea
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-28T11:58:43Z
[CARBONDATA-2217]fix drop partition for non existing partition and set
FactTimeStamp during compaction for partition table
Problem:
1)when drop partition is fired for a column which does not exists , it
throws null pointer exception
2)select * is not working when clean files operation is fired after second
level of compaction, it throws exception sometimes
3)new segment is getting created for all the segments if any one partition
is dropped
Solution:
1)have a null check , if column does not exists
2)give different timestamp for fact files during compaction to avoid
deletion of files during clean files
3)for the partition which is dropped, only for that new segment file should
be written and not for all the partition
4) This PR also contains fix for creating a pre aggregate table with same
name which has already created in other database
This closes #2017
commit 660190fb544e338acd131e7cc30de171e7600df6
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-28T12:08:50Z
[CARBONDATA-2103]Make show datamaps configurable in show tables command
Make the show datamaps in show tables configurable:
a new carbon property is added called carbon.query.show.datamaps, by
default is it true, show show tables will list all the table including main
table and datamaps.
if we want to filter datamaps in show tables, configure this as false
This closes #2015
commit 092b5d58a50498a0a66bf6166907965612eb1fc5
Author: ravipesala <ravi.pesala@...>
Date: 2018-03-01T06:34:53Z
[CARBONDATA-2219] Added validation for external partition location to use
same schema.
This closes #2018
----
---