[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...

xubo245 Tue, 04 Sep 2018 01:44:25 -0700

GitHub user xubo245 opened a pull request:

    https://github.com/apache/carbondata/pull/2691


    [CARBONDATA-2912] Support CSV table load csv data with spark2.2     

    In branch-1.3, CSV table cann't load csv data with spark2.2
    Carbon need upgrade commons-lang3 vision
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xubo245/carbondata 
CARBONDATA-2912_twoInsert1.3.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2691
    
----
commit c055c8f33123bfb6e1103456bea23a0ff8c944ca
Author: ravipesala <ravi.pesala@...>
Date:   2018-02-03T20:31:00Z

    [maven-release-plugin] prepare release apache-carbondata-1.3.0-rc2

commit 607b4cef646b2b9a3c2a8fc687dc40342165979a
Author: ravipesala <ravi.pesala@...>
Date:   2018-02-03T20:31:53Z

    [maven-release-plugin] prepare for next development iteration

commit 449668ad9cda869b14f31dcc2c6df6454701cddc
Author: dhatchayani <dhatcha.official@...>
Date:   2018-02-05T10:51:09Z

    [CARBONDATA-2131] Alter table adding long datatype is failing but Create 
table with long type is successful, in Spark 2.1
    
    Modified code to make "Create table" supported data types and "alter add 
columns" supported data types consistent
    
    This closes #1932

commit a3b97f38412cf96ee041b6ebfbd7c39af54e391d
Author: kumarvishal <kumarvishal.1802@...>
Date:   2018-02-05T09:47:02Z

    [CARBONDATA-2142] Fixed Pre-Aggregate datamap creation issue
    
    Fixed Reverting changes issue in case of create pre-aggregate dataâ¦ map 
creation is failing
    Removed look-up while creating the pre-aggregate data map
    Removed unused code
    
    This closes #1943

commit 2c5ecfbfe5ce3357d041207cad8edcf587e4115f
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-07T13:14:33Z

    [CARBONDATA-2119]deserialization issue for carbonloadmodel
    
    Problem:
    Load model was not getting de-serialized in the executor due to which 2 
different carbon table objects were being created.
    Solution:
    Reconstruct carbonTable from tableInfo if not already created.
    
    This closes #1947

commit 8b105a1e1f6e7e7e3b0bc13d44c1bf93fd821e31
Author: m00258959 <manish.gupta@...>
Date:   2018-02-07T06:37:33Z

    [CARBONDATA-2143] Fixed query memory leak issue for task failure during 
initialization of record reader
    
    Problem:
    Whenever a query is executed, in the internalCompute method of 
CarbonScanRdd class record reader is initialized. A task completion listener is 
attached to each task after initialization of the record reader.
    During record reader initialization, queryResultIterator is initialized and 
one blocklet is processed. The blocklet processed will use available unsafe 
memory.
    Lets say there are 100 columns and 80 columns get the space but there is no 
space left for the remaining columns to be stored in the unsafe memory. This 
will result is memory exception and record reader initialization will fail 
leading to failure in query.
    In the above case the unsafe memory allocated for 80 columns will not be 
freed and will always remain occupied till the JVM process persists.
    
    Impact
    It is memory leak in the system and can lead to query failures for queries 
executed after one one query fails due to the above reason.
    
    Solution:
    Attach the task completion listener before record reader initialization so 
that if the query fails at the very first instance after using unsafe memory, 
still that memory will be cleared.
    
    This closes #1948

commit 9f73f0e60611c52278d2d475a89d42adebf32f60
Author: m00258959 <manish.gupta@...>
Date:   2018-02-05T11:40:18Z

    [CARBONDATA-2134] Prevent implicit column filter list from getting 
serialized while submitting task to executor
    
    Problem
    In the current store blocklet pruning in driver and no further pruning 
takes place in the executor side. But still the implicit column filter list 
being sent to executor. As the size of list grows the cost of serializing and 
deserializing the list is increasing which can impact the query performance.
    
    Solution
    Remove the list from the filter expression before submitting the task to 
executor.
    
    This closes #1935

commit 1137c285f55dfdc0de24bdebf81d78187df93f8a
Author: kunal642 <kunalkapoor642@...>
Date:   2018-02-08T06:20:23Z

    [CARBONDATA-1763] Dropped table if exception thrown while creation
    
    Preaggregate table is not getting dropped when creation fails because
    
    Exceptions from undo metadata is not handled
    If preaggregate table is not registered with main table(main table updation 
fails) then it is not dropped from metastore.
    
    This closes #1951

commit 6e435de5e04ace63fe5b105e2f180ef0932d80d3
Author: rahulforallp <rahul.kumar@...>
Date:   2018-02-06T13:11:35Z

    [CARBONDATA-2137] Delete query performance improved
    
    Following is the configuration used :
    
    SPARK_EXECUTOR_MEMORY : 200G
    SPARK_DRIVER_MEMORY : 20G
    SPARK_EXECUTOR_CORES : 32
    SPARK_EXECUTOR_INSTANCEs : 3
    
    Earlier it was taking 20 minute now it is taking approx 5 minute
    
    This closes #1937

commit bc3f825107517ad1e39a385c488beadd6022ab8e
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-08T17:40:43Z

    [CARBONDATA-2150] Unwanted updatetable status files are being generated for 
the delete operation where no records are deleted
    
    Problem:
    Unwanted updatetable status files are being generated for the delete 
operation where no records are deleted
    
    Analysis:
    when the filter value for delete operation is less than the maximum value 
in that column, then getsplits() will return the
    block and hence in delete logic, it was creating update table status file 
even though delete operation was not done and
    added spark context to create database event
    
    This closes #1957

commit 15cc7fa97722d055ad5627b3a915ee6d2b6817d6
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-14T13:37:15Z

    [CARBONDATA-2182] Added one more params called extraParams in SessionParams 
and add carbonSessionInfo to CarbonEnvInitPreEvent
    
    Add one more param called ExtraParmas in SessionParams for session Level 
operations and pass the carbonSessionInfo to event, so that user can
    save information in that at session level in carbonSessionInfo
    
    This closes #1978

commit 27634deee82d7a1560e75f8dfc09333eb8df51db
Author: anubhav100 <anubhav.tarar@...>
Date:   2018-02-06T08:03:39Z

    [CARBONDATA-2133] Fixed Exception displays after performing select query on 
newly added Boolean Type
    
    Problem : In Restructure util and RestructureBasedVectorResultCollector to 
get the default value of a measure type the case for boolean data type was
    missing,and in DataTypeUtil to store default value in bytes case of boolean 
data type was missing
    
    Solution: Add the Boolean data type case
    
    This closes #1934

commit aff3b39efd772a881590432816369a05d0cb5855
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-15T13:30:26Z

    [CARBONDATA-2103] Optimize show tables for filtering datamaps
    
    Problem
    Show tables was taking more time as two times lookup was happening to 
filter out the datamaps
    
    Solution
    add a hive table property which is true for all tables and false for 
datamaps like preAggregate table and show tables filter out these tables
    based on the property.
    
    This closes #1980

commit 7beef112b59c9ccfe14baca87ae841cfe77e4dce
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-14T10:15:04Z

    [CARBONDATA-2183] Fix compaction when segment is delete during compaction 
and remove unnecessary parameters in functions
    
    Problem:
    when compaction is started and job is running, and parallelly the segment 
involved in the compaction is deleted using DeleteSegmentByID, then
    compaction is success.
    
    Solution:
    when compaction is started and job is running, and parallelly the segment 
involved in the compaction is deleted using DeleteSegmentByID, then
    compaction should be aborted and failed. and proper error message should 
thrown to user. THis PR also removes the unnecessary parameters in functions.
    
    This closes #1979

commit 39ac94e462e6571414dee8f58c174e44a79f8ad4
Author: kunal642 <kunalkapoor642@...>
Date:   2018-02-12T19:23:31Z

    [CARBONDATA-2142] [CARBONDATA-1763] Fixed issues while creation concurrent 
datamaps
    
    Analysis:
    1. GenerateTableSchemaString in CarbonMetastore did not have any specific 
implementation for hive metastore due to which carbontables were being
    cached in MetaData. As there is no way to refresh table in hivemetastore 
therefore this is wrong. All queries should get the latest carbon table
    from metastore and not from cache.
    2. If updating the main table status fails then revertMainTableChanges 
method is called to revert the changes. The logic to revert was wrong which led
    to wrong entry getting deleted from the schema.
    3. Moved the force remove logic before taking locks as deletion from 
metastore should happen even if the lock if not present as the table is in
    stale state(Entry is not there in parent but available in metastore).
    
    This closes #1975

commit c2785b352f7b7cb2dd524811b0696fb18c12d5b0
Author: BJangir <babulaljangir111@...>
Date:   2018-02-11T19:32:30Z

    [CARBONDATA-2161] update mergeTo column for compacted segment of streaming 
table
    
    This closes #1971

commit f8a62a9bd8ba39cd6bc247c587a7a3e1afd99254
Author: QiangCai <qiangcai@...>
Date:   2018-02-11T08:06:01Z

    [CARBONDATA-2151][Streaming] Fix filter query issue on streaming table
    
    1.Fix filter query issue for timestamp, date, decimal
    2.Add more test case
    dataType: int, streaming, float, double, decimal, timestamp, date, complex
    operation: =, <>, >=, >, <, <=, in, like, between, is null, is not null
    
    This closes #1969

commit 4bbbd4b1df444163cfb72cf74a05c1a9d09e1200
Author: BJangir <babulaljangir111@...>
Date:   2018-02-19T17:01:00Z

    [CARBONDATA-2185] Add InputMetrics for Streaming Reader
    
    This closes #1985

commit 6f9016db52dd3f9c31ba20e585debfc283e2594e
Author: Zhang Zhichao <441586683@...>
Date:   2018-02-09T09:32:54Z

    [CARBONDATA-2149]Fix complex type data displaying error when use DataFrame 
to write complex type data
    
    The default value of 'complex_delimiter_level_1' and 
'complex_delimiter_level_2' is wrong, it must be '$' and ':', not be '$' and 
'\:'. Escape characters '\' need to be added only when using delimiters in 
ArrayParserImpl or StructParserImpl
    
    This closes #1962

commit b0a2fabcc8584dfba24ad0ea135948f5365a7335
Author: QiangCai <qiangcai@...>
Date:   2018-02-25T10:53:41Z

    [CARBONDATA-2200] Fix bug of LIKE operation on streaming table
    
    Fix bug of LIKE operation on streaming table,
    LIKE operation will be converted to StartsWith / EndsWith / Contains 
expression.
    Carbon will use RowLevelFilterExecuterImpl to evaluate this expression.
    Streaming table also should implement RowLevelFilterExecuterImpl.
    
    This closes #1996

commit e363dd1a68e2138591a930055dd1701a1245825f
Author: rahulforallp <rahul.kumar@...>
Date:   2018-02-25T09:55:26Z

    [CARBONDATA-2201] NPE fixed while triggering the LoadTablePreExecutionEvent 
before Streaming
    
    While triggering the LoadTablePreExecutionEvent we require options provided 
by user and the finalOptions.
    In case of streaming both are same. If we pass null . It may cause NPE.
    
    This closes #1997

commit 0f210c86ca3ee9f0fa845cdeaef418ed9253b6f8
Author: Zhang Zhichao <441586683@...>
Date:   2018-02-04T04:54:24Z

    [MINOR]Remove dependency of Java 1.8
    
    This closes #1928

commit 758d03e783e324f70b6599be7feb1951b1034f51
Author: ravipesala <ravi.pesala@...>
Date:   2018-02-09T04:07:02Z

    [CARBONDATA-2168] Support global sort for standard hive partitioning
    
    This closes #1972

commit 1997ca235f90b5746262c9654b685b9b6bd3f16a
Author: ravipesala <ravi.pesala@...>
Date:   2018-02-14T19:01:56Z

    [CARBONDATA-2187][PARTITION] Partition restructure for new folder structure 
and supporting partition location feature
    
    This closes #1984

commit b51d8186a82818672067dfd0387af6ff505f940c
Author: Jatin <jatin.demla@...>
Date:   2018-02-23T11:26:17Z

    [CARBONDATA-2199] Fixed Dimension column after restructure getting wrong 
block datatype
    
    Problem: Changing datatype of measure having sort_columns calls for 
restructure and after having restructure it changes the datatype to actual 
datatype for which accessing the data with changed datatype gives exception of 
incorrect length.
    
    Solution: Store the datatype in DimensionInfo while restructuring and 
access the same datatype to get the block data type.
    
    This closes #1993

commit 7726b4f9b379b0eec4b9fff6571415f47fa55587
Author: Jatin <jatin.demla@...>
Date:   2018-02-27T10:43:40Z

    [CARBONDATA-2207] Fix testcases after using hive metastore
    
    CarbonTable was getting null in case of hivemetatore so, fetch the same 
from metastore instead of carbon.
    
    This closes #2005

commit b360f9084f873bc096d7fabfde20730fbc752350
Author: chenliang613 <chenliang613@...>
Date:   2018-02-08T17:32:38Z

    [HOTFIX] Add partition usage code
    
    This closes #1956

commit b9a6b68658fd0f7f408102374b3ef31dcfe44cea
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-28T11:58:43Z

    [CARBONDATA-2217]fix drop partition for non existing partition and set 
FactTimeStamp during compaction for partition table
    
    Problem:
    1)when drop partition is fired for a column which does not exists , it 
throws null pointer exception
    2)select * is not working when clean files operation is fired after second 
level of compaction, it throws exception sometimes
    3)new segment is getting created for all the segments if any one partition 
is dropped
    
    Solution:
    1)have a null check , if column does not exists
    2)give different timestamp for fact files during compaction to avoid 
deletion of files during clean files
    3)for the partition which is dropped, only for that new segment file should 
be written and not for all the partition
    4) This PR also contains fix for creating a pre aggregate table with same 
name which has already created in other database
    
    This closes #2017

commit 660190fb544e338acd131e7cc30de171e7600df6
Author: akashrn5 <akashnilugal@...>
Date:   2018-02-28T12:08:50Z

    [CARBONDATA-2103]Make show datamaps configurable in show tables command
    
    Make the show datamaps in show tables configurable:
    
    a new carbon property is added called carbon.query.show.datamaps, by 
default is it true, show show tables will list all the table including main 
table and datamaps.
    if we want to filter datamaps in show tables, configure this as false
    
    This closes #2015

commit 092b5d58a50498a0a66bf6166907965612eb1fc5
Author: ravipesala <ravi.pesala@...>
Date:   2018-03-01T06:34:53Z

    [CARBONDATA-2219] Added validation for external partition location to use 
same schema.
    
    This closes #2018

----


---

[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...

Reply via email to