Unsubscribeme

J 12323123 Thu, 29 Aug 2019 01:05:40 -0700

On Thu, 29 Aug 2019, 19:26 Raghunandan S, <[email protected]>
wrote:


> Hi All,
>
> Apache CarbonData community is pleased to announce the release of the
> Version 1.6.0 in The Apache Software Foundation (ASF).
>
> CarbonData is a high-performance data solution that supports various data
> analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
> lookup on detail record, streaming analytics, and so on. CarbonData has
> been deployed in many enterprise production environments, in one of the
> largest scenarios, it supports queries on a single table with 3PB data
> (more than 5 trillion records) with response time less than 3 seconds!
>
> We encourage you to use the release
> https://dist.apache.org/repos/dist/release/carbondata/1.6.0/, and feedback
> through the CarbonData user mailing lists <[email protected]>!
>
> This release note provides information on the new features, improvements,
> and bug fixes of this release.
> What’s New in CarbonData Version 1.6.0?
>
> CarbonData 1.6.0 intention was to move closer to unified analytics. We have
> added index server to distribute the index cache. We have also supported
> incremental loading on MV datamaps to improve the loading time of datamap.
> we are now supporting reading cabondata tables from Hive and also supported
> Arrow format form SDK.
>
> In this version of CarbonData, around 75 JIRA tickets related to new
> features, improvements, and bugs have been resolved. Following are the
> summary.
> Index Server to distribute the index cache and parallelise the index
> pruning
>
>
> Carbon currently prunes and caches all block/blocklet datamap index
> information into the driver. If the cache size becomes huge(70-80% of the
> driver memory) then there can be excessive GC in the driver which can slow
> down the queries and the driver may even go OutOfMemory. If multiple JDBC
> drivers want to read from same tables then every JDBC server needs to
> maintain their own copy of the cache. To solve these problems we have
> introduced distributed Index Cache Server. It is separate scalable server
> stores only index information and all the drivers can connect and prune the
> data using cached index information.
> Incremental data loading on MV datamaps
>
> Currently, MV datamaps can only be loaded with full load for any new data
> load on the parent table. Now we have supported incremental loading on MV
> datamaps so for any new load on parent table triggers the load on MV
> datamap only for incrementally added data.
> Supported Arrow format from Carbon SDK
>
> SDK reader now supports reading carbondata files and filling it to apache
> arrow vectors. This helps to avoid unnecessary intermediate serialisations
> when accessing from other execution engines or languages.
> Supported read from Hive
>
> CarbonData files can be read from the Hive. This helps users to easily
> migrate to CarbonData format on existing Hive deployments using other
> formats.
> Behaviour Change
>
> None
>
>
> Please find the detailed JIRA list:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344965
>
>
> Sub-task
>
>    - [CARBONDATA-3306
>    <https://issues.apache.org/jira/browse/CARBONDATA-3306>] - Implement a
>    DistributableIndexPruneRDD and IndexPruneFileFormat
>    - [CARBONDATA-3337
>    <https://issues.apache.org/jira/browse/CARBONDATA-3337>] - Implement a
>    Hadoop RPC framwork for communication
>    - [CARBONDATA-3338
>    <https://issues.apache.org/jira/browse/CARBONDATA-3338>] - Incremental
>    dat load support to datamap on single table
>    - [CARBONDATA-3349
>    <https://issues.apache.org/jira/browse/CARBONDATA-3349>] - add
> is_sorted
>    and sort_columns information into show segments
>    - [CARBONDATA-3350
>    <https://issues.apache.org/jira/browse/CARBONDATA-3350>] - enhance
>    custom compaction to support resort single segment
>    - [CARBONDATA-3357
>    <https://issues.apache.org/jira/browse/CARBONDATA-3357>] - Support
>    TableProperties from single parent table and restrict
>    alter/delete/partition on mv
>    - [CARBONDATA-3378
>    <https://issues.apache.org/jira/browse/CARBONDATA-3378>] - Display
>    original query in Indexserver Job
>    - [CARBONDATA-3381
>    <https://issues.apache.org/jira/browse/CARBONDATA-3381>] - Large
>    response size Exception is thrown from index server.
>    - [CARBONDATA-3387
>    <https://issues.apache.org/jira/browse/CARBONDATA-3387>] - Support
>    Partition with MV datamap & Show DataMap Status
>    - [CARBONDATA-3392
>    <https://issues.apache.org/jira/browse/CARBONDATA-3392>] - Make use of
>    LRU mandatory when using IndexServer
>    - [CARBONDATA-3398
>    <https://issues.apache.org/jira/browse/CARBONDATA-3398>] - Implement
>    Show Cache for IndexServer and MV
>    - [CARBONDATA-3399
>    <https://issues.apache.org/jira/browse/CARBONDATA-3399>] - Implement
>    Executor ID based task distribution for Index Server
>    - [CARBONDATA-3402
>    <https://issues.apache.org/jira/browse/CARBONDATA-3402>] - Block
> complex
>    data types and validate dmproperties in mv
>    - [CARBONDATA-3408
>    <https://issues.apache.org/jira/browse/CARBONDATA-3408>] -
> CarbonSession
>    partition support binary data type
>    - [CARBONDATA-3409
>    <https://issues.apache.org/jira/browse/CARBONDATA-3409>] - Fix
>    Concurrent dataloading Issue with mv
>    - [CARBONDATA-3423
>    <https://issues.apache.org/jira/browse/CARBONDATA-3423>] - Validate
>    dictionary for binary data type
>    - [CARBONDATA-3440
>    <https://issues.apache.org/jira/browse/CARBONDATA-3440>] - Expose a DDL
>    to add index size and data size to tableStatus for legacy segments
>    - [CARBONDATA-3459
>    <https://issues.apache.org/jira/browse/CARBONDATA-3459>] - Fixed id
>    based distribution for show cache command
>    - [CARBONDATA-3462
>    <https://issues.apache.org/jira/browse/CARBONDATA-3462>] - Add usage
> and
>    deployment document for index server
>
> Bug
>
>    - [CARBONDATA-3247
>    <https://issues.apache.org/jira/browse/CARBONDATA-3247>] - Support to
>    select all columns when creating MV datamap
>    - [CARBONDATA-3291
>    <https://issues.apache.org/jira/browse/CARBONDATA-3291>] - MV datamap
>    doesn't take affect when the same table join
>    - [CARBONDATA-3294
>    <https://issues.apache.org/jira/browse/CARBONDATA-3294>] - MV datamap
>    throw error when using count(1) and case when expression
>    - [CARBONDATA-3295
>    <https://issues.apache.org/jira/browse/CARBONDATA-3295>] - MV datamap
>    throw exception because its rewrite algorithm when multiply subquery
>    - [CARBONDATA-3303
>    <https://issues.apache.org/jira/browse/CARBONDATA-3303>] - MV datamap
>    return wrong results when using coalesce and less groupby columns
>    - [CARBONDATA-3317
>    <https://issues.apache.org/jira/browse/CARBONDATA-3317>] - Executing
>    'show segments' command throws NPE when spark streaming app write data
> to
>    new stream segment.
>    - [CARBONDATA-3356
>    <https://issues.apache.org/jira/browse/CARBONDATA-3356>] - There are
>    some exception when carbonData DataSource read SDK files with varchar
>    - [CARBONDATA-3364
>    <https://issues.apache.org/jira/browse/CARBONDATA-3364>] - Support Read
>    from Hive. Queries are giving empty results from hive.
>    - [CARBONDATA-3367
>    <https://issues.apache.org/jira/browse/CARBONDATA-3367>] - OOM when
> huge
>    number of carbondata files are read from SDK reader
>    - [CARBONDATA-3368
>    <https://issues.apache.org/jira/browse/CARBONDATA-3368>] - InferSchema
>    from datafile instead of index file
>    - [CARBONDATA-3380
>    <https://issues.apache.org/jira/browse/CARBONDATA-3380>] - Fix missing
>    appName and AnalysisException bug in DirectSQLExample
>    - [CARBONDATA-3382
>    <https://issues.apache.org/jira/browse/CARBONDATA-3382>] - Fix
>    compressor type displayed in desc formatted
>    - [CARBONDATA-3384
>    <https://issues.apache.org/jira/browse/CARBONDATA-3384>] -
> Delete/Update
>    is throwing NullPointerException when index server is enabled.
>    - [CARBONDATA-3393
>    <https://issues.apache.org/jira/browse/CARBONDATA-3393>] - Merge Index
>    Job Failure should not trigger the merge index job again. Exception
>    propagation should be decided by the User.
>    - [CARBONDATA-3395
>    <https://issues.apache.org/jira/browse/CARBONDATA-3395>] - When same
>    split object is passed to concurrent readers, build() fails randomly
> with
>    Exception.
>    - [CARBONDATA-3396
>    <https://issues.apache.org/jira/browse/CARBONDATA-3396>] - Range
>    Compaction Data mismatch
>    - [CARBONDATA-3397
>    <https://issues.apache.org/jira/browse/CARBONDATA-3397>] - Remove
>    SparkUnknown Expression to Index Server
>    - [CARBONDATA-3400
>    <https://issues.apache.org/jira/browse/CARBONDATA-3400>] - Support
>    IndexSever for Spark-Shell for in secure KERBROSE mode
>    - [CARBONDATA-3403
>    <https://issues.apache.org/jira/browse/CARBONDATA-3403>] - MV is not
>    working for like and filter AND and OR queries
>    - [CARBONDATA-3405
>    <https://issues.apache.org/jira/browse/CARBONDATA-3405>] - SDK reader
>    getSplits() must clear the cache.
>    - [CARBONDATA-3406
>    <https://issues.apache.org/jira/browse/CARBONDATA-3406>] - Support
>    Binary, Boolean,Varchar, Complex data types read and Dictionary columns
> read
>    - [CARBONDATA-3407
>    <https://issues.apache.org/jira/browse/CARBONDATA-3407>] - distinct,
>    count, Sum query fails when MV is created on single projection column
>    - [CARBONDATA-3416
>    <https://issues.apache.org/jira/browse/CARBONDATA-3416>] - When new
>    analyzer rule added in spark, not reflecting in carbon
>    - [CARBONDATA-3417
>    <https://issues.apache.org/jira/browse/CARBONDATA-3417>] - Load time
>    degrade for Range column due to cores configured
>    - [CARBONDATA-3418
>    <https://issues.apache.org/jira/browse/CARBONDATA-3418>] - Inherit
>    Column Compressor Property from parent table to its child table's
>    - [CARBONDATA-3419
>    <https://issues.apache.org/jira/browse/CARBONDATA-3419>] - Desc
>    Formatted not showing Range Column
>    - [CARBONDATA-3424
>    <https://issues.apache.org/jira/browse/CARBONDATA-3424>] - There are
>    improper exception when query with avg(substr(binary data type)).
>    - [CARBONDATA-3426
>    <https://issues.apache.org/jira/browse/CARBONDATA-3426>] - Fix Load
>    performance degrade by fixing task distribution
>    - [CARBONDATA-3429
>    <https://issues.apache.org/jira/browse/CARBONDATA-3429>] - CarbonCli on
>    wrong segment path wrong error message is displayed
>    - [CARBONDATA-3432
>    <https://issues.apache.org/jira/browse/CARBONDATA-3432>] - Range Column
>    compaction sending all the splits to all the executors one by one
>    - [CARBONDATA-3433
>    <https://issues.apache.org/jira/browse/CARBONDATA-3433>] - MV has
> issues
>    when create on constant column, dupicate columns and limit queries
>    - [CARBONDATA-3436
>    <https://issues.apache.org/jira/browse/CARBONDATA-3436>] - update pre
>    insert into rule as per spark
>    - [CARBONDATA-3437
>    <https://issues.apache.org/jira/browse/CARBONDATA-3437>] - Map
>    Implementation not correct
>    - [CARBONDATA-3442
>    <https://issues.apache.org/jira/browse/CARBONDATA-3442>] - Fix creating
>    mv datamap with column name having length more than 128
>    - [CARBONDATA-3453
>    <https://issues.apache.org/jira/browse/CARBONDATA-3453>] - Set segment
>    doesn't work with adaptive execution
>    - [CARBONDATA-3455
>    <https://issues.apache.org/jira/browse/CARBONDATA-3455>] - Job Group ID
>    is not displayed in the IndexServer
>    - [CARBONDATA-3456
>    <https://issues.apache.org/jira/browse/CARBONDATA-3456>] - Fix
>    DataLaoding on MV when Yarn-Application is killed
>    - [CARBONDATA-3457
>    <https://issues.apache.org/jira/browse/CARBONDATA-3457>] - [MV]Fix
>    Column not found with Cast Expression
>    - [CARBONDATA-3458
>    <https://issues.apache.org/jira/browse/CARBONDATA-3458>] - Running
> load,
>    insert , CTAS command on carbon table sets double Execution ID info,
> and ID
>    of CTAS is null
>    - [CARBONDATA-3460
>    <https://issues.apache.org/jira/browse/CARBONDATA-3460>] - EOF
> exception
>    is thrown when quering using index server
>    - [CARBONDATA-3467
>    <https://issues.apache.org/jira/browse/CARBONDATA-3467>] - Fix count(*)
>    with filter on string value
>    - [CARBONDATA-3474
>    <https://issues.apache.org/jira/browse/CARBONDATA-3474>] - Fix validate
>    mvQuery having filter expression and correct error message
>    - [CARBONDATA-3476
>    <https://issues.apache.org/jira/browse/CARBONDATA-3476>] - Read time
> and
>    scan time stats shown wrong in executor log for filter query
>    - [CARBONDATA-3477
>    <https://issues.apache.org/jira/browse/CARBONDATA-3477>] - Throw out
>    exception when use sql: 'update table select\n...'
>    - [CARBONDATA-3478
>    <https://issues.apache.org/jira/browse/CARBONDATA-3478>] - Fix
>    ArrayIndexOutOfBoundsException issue on compaction after alter rename
>    operation
>    - [CARBONDATA-3481
>    <https://issues.apache.org/jira/browse/CARBONDATA-3481>] - Multi-thread
>    pruning fails when datamaps count is just near numOfThreadsForPruning
>    - [CARBONDATA-3482
>    <https://issues.apache.org/jira/browse/CARBONDATA-3482>] - Null pointer
>    exception when concurrent select queries are executed from different
>    beeline terminals.
>    - [CARBONDATA-3483
>    <https://issues.apache.org/jira/browse/CARBONDATA-3483>] - Can not run
>    horizontal compaction when execute update sql
>    - [CARBONDATA-3486
>    <https://issues.apache.org/jira/browse/CARBONDATA-3486>] -
>    Serialization/ deserialization issue with Datatype
>    - [CARBONDATA-3490
>    <https://issues.apache.org/jira/browse/CARBONDATA-3490>] - Concurrent
>    data load failure with carbondata FileNotFound exception
>    - [CARBONDATA-3493
>    <https://issues.apache.org/jira/browse/CARBONDATA-3493>] - Carbon query
>    fails when enable.query.statistics is true in specific scenario.
>
> New Feature
>
>    - [CARBONDATA-3404
>    <https://issues.apache.org/jira/browse/CARBONDATA-3404>] - Support
>    CarbonFile API for coniguring custom file systems
>
> Improvement
>
>    - [CARBONDATA-3309
>    <https://issues.apache.org/jira/browse/CARBONDATA-3309>] - MV datamap
>    adapt to spark 2.1 version
>    - [CARBONDATA-3365
>    <https://issues.apache.org/jira/browse/CARBONDATA-3365>] - Support
>    Apache arrow vector filling from carbondata SDK
>    - [CARBONDATA-3447
>    <https://issues.apache.org/jira/browse/CARBONDATA-3447>] - Index Server
>    Performance Improvement
>    - [CARBONDATA-3488
>    <https://issues.apache.org/jira/browse/CARBONDATA-3488>] - Check the
>    file size after move local file to carbon path
>
>
> Regards
> Raghunandan
>

Unsubscribeme

Reply via email to