On Thu, 29 Aug 2019, 19:26 Raghunandan S, <[email protected]> wrote:
> Hi All, > > Apache CarbonData community is pleased to announce the release of the > Version 1.6.0 in The Apache Software Foundation (ASF). > > CarbonData is a high-performance data solution that supports various data > analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter > lookup on detail record, streaming analytics, and so on. CarbonData has > been deployed in many enterprise production environments, in one of the > largest scenarios, it supports queries on a single table with 3PB data > (more than 5 trillion records) with response time less than 3 seconds! > > We encourage you to use the release > https://dist.apache.org/repos/dist/release/carbondata/1.6.0/, and feedback > through the CarbonData user mailing lists <[email protected]>! > > This release note provides information on the new features, improvements, > and bug fixes of this release. > What’s New in CarbonData Version 1.6.0? > > CarbonData 1.6.0 intention was to move closer to unified analytics. We have > added index server to distribute the index cache. We have also supported > incremental loading on MV datamaps to improve the loading time of datamap. > we are now supporting reading cabondata tables from Hive and also supported > Arrow format form SDK. > > In this version of CarbonData, around 75 JIRA tickets related to new > features, improvements, and bugs have been resolved. Following are the > summary. > Index Server to distribute the index cache and parallelise the index > pruning > > > Carbon currently prunes and caches all block/blocklet datamap index > information into the driver. If the cache size becomes huge(70-80% of the > driver memory) then there can be excessive GC in the driver which can slow > down the queries and the driver may even go OutOfMemory. If multiple JDBC > drivers want to read from same tables then every JDBC server needs to > maintain their own copy of the cache. To solve these problems we have > introduced distributed Index Cache Server. It is separate scalable server > stores only index information and all the drivers can connect and prune the > data using cached index information. > Incremental data loading on MV datamaps > > Currently, MV datamaps can only be loaded with full load for any new data > load on the parent table. Now we have supported incremental loading on MV > datamaps so for any new load on parent table triggers the load on MV > datamap only for incrementally added data. > Supported Arrow format from Carbon SDK > > SDK reader now supports reading carbondata files and filling it to apache > arrow vectors. This helps to avoid unnecessary intermediate serialisations > when accessing from other execution engines or languages. > Supported read from Hive > > CarbonData files can be read from the Hive. This helps users to easily > migrate to CarbonData format on existing Hive deployments using other > formats. > Behaviour Change > > None > > > Please find the detailed JIRA list: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344965 > > > Sub-task > > - [CARBONDATA-3306 > <https://issues.apache.org/jira/browse/CARBONDATA-3306>] - Implement a > DistributableIndexPruneRDD and IndexPruneFileFormat > - [CARBONDATA-3337 > <https://issues.apache.org/jira/browse/CARBONDATA-3337>] - Implement a > Hadoop RPC framwork for communication > - [CARBONDATA-3338 > <https://issues.apache.org/jira/browse/CARBONDATA-3338>] - Incremental > dat load support to datamap on single table > - [CARBONDATA-3349 > <https://issues.apache.org/jira/browse/CARBONDATA-3349>] - add > is_sorted > and sort_columns information into show segments > - [CARBONDATA-3350 > <https://issues.apache.org/jira/browse/CARBONDATA-3350>] - enhance > custom compaction to support resort single segment > - [CARBONDATA-3357 > <https://issues.apache.org/jira/browse/CARBONDATA-3357>] - Support > TableProperties from single parent table and restrict > alter/delete/partition on mv > - [CARBONDATA-3378 > <https://issues.apache.org/jira/browse/CARBONDATA-3378>] - Display > original query in Indexserver Job > - [CARBONDATA-3381 > <https://issues.apache.org/jira/browse/CARBONDATA-3381>] - Large > response size Exception is thrown from index server. > - [CARBONDATA-3387 > <https://issues.apache.org/jira/browse/CARBONDATA-3387>] - Support > Partition with MV datamap & Show DataMap Status > - [CARBONDATA-3392 > <https://issues.apache.org/jira/browse/CARBONDATA-3392>] - Make use of > LRU mandatory when using IndexServer > - [CARBONDATA-3398 > <https://issues.apache.org/jira/browse/CARBONDATA-3398>] - Implement > Show Cache for IndexServer and MV > - [CARBONDATA-3399 > <https://issues.apache.org/jira/browse/CARBONDATA-3399>] - Implement > Executor ID based task distribution for Index Server > - [CARBONDATA-3402 > <https://issues.apache.org/jira/browse/CARBONDATA-3402>] - Block > complex > data types and validate dmproperties in mv > - [CARBONDATA-3408 > <https://issues.apache.org/jira/browse/CARBONDATA-3408>] - > CarbonSession > partition support binary data type > - [CARBONDATA-3409 > <https://issues.apache.org/jira/browse/CARBONDATA-3409>] - Fix > Concurrent dataloading Issue with mv > - [CARBONDATA-3423 > <https://issues.apache.org/jira/browse/CARBONDATA-3423>] - Validate > dictionary for binary data type > - [CARBONDATA-3440 > <https://issues.apache.org/jira/browse/CARBONDATA-3440>] - Expose a DDL > to add index size and data size to tableStatus for legacy segments > - [CARBONDATA-3459 > <https://issues.apache.org/jira/browse/CARBONDATA-3459>] - Fixed id > based distribution for show cache command > - [CARBONDATA-3462 > <https://issues.apache.org/jira/browse/CARBONDATA-3462>] - Add usage > and > deployment document for index server > > Bug > > - [CARBONDATA-3247 > <https://issues.apache.org/jira/browse/CARBONDATA-3247>] - Support to > select all columns when creating MV datamap > - [CARBONDATA-3291 > <https://issues.apache.org/jira/browse/CARBONDATA-3291>] - MV datamap > doesn't take affect when the same table join > - [CARBONDATA-3294 > <https://issues.apache.org/jira/browse/CARBONDATA-3294>] - MV datamap > throw error when using count(1) and case when expression > - [CARBONDATA-3295 > <https://issues.apache.org/jira/browse/CARBONDATA-3295>] - MV datamap > throw exception because its rewrite algorithm when multiply subquery > - [CARBONDATA-3303 > <https://issues.apache.org/jira/browse/CARBONDATA-3303>] - MV datamap > return wrong results when using coalesce and less groupby columns > - [CARBONDATA-3317 > <https://issues.apache.org/jira/browse/CARBONDATA-3317>] - Executing > 'show segments' command throws NPE when spark streaming app write data > to > new stream segment. > - [CARBONDATA-3356 > <https://issues.apache.org/jira/browse/CARBONDATA-3356>] - There are > some exception when carbonData DataSource read SDK files with varchar > - [CARBONDATA-3364 > <https://issues.apache.org/jira/browse/CARBONDATA-3364>] - Support Read > from Hive. Queries are giving empty results from hive. > - [CARBONDATA-3367 > <https://issues.apache.org/jira/browse/CARBONDATA-3367>] - OOM when > huge > number of carbondata files are read from SDK reader > - [CARBONDATA-3368 > <https://issues.apache.org/jira/browse/CARBONDATA-3368>] - InferSchema > from datafile instead of index file > - [CARBONDATA-3380 > <https://issues.apache.org/jira/browse/CARBONDATA-3380>] - Fix missing > appName and AnalysisException bug in DirectSQLExample > - [CARBONDATA-3382 > <https://issues.apache.org/jira/browse/CARBONDATA-3382>] - Fix > compressor type displayed in desc formatted > - [CARBONDATA-3384 > <https://issues.apache.org/jira/browse/CARBONDATA-3384>] - > Delete/Update > is throwing NullPointerException when index server is enabled. > - [CARBONDATA-3393 > <https://issues.apache.org/jira/browse/CARBONDATA-3393>] - Merge Index > Job Failure should not trigger the merge index job again. Exception > propagation should be decided by the User. > - [CARBONDATA-3395 > <https://issues.apache.org/jira/browse/CARBONDATA-3395>] - When same > split object is passed to concurrent readers, build() fails randomly > with > Exception. > - [CARBONDATA-3396 > <https://issues.apache.org/jira/browse/CARBONDATA-3396>] - Range > Compaction Data mismatch > - [CARBONDATA-3397 > <https://issues.apache.org/jira/browse/CARBONDATA-3397>] - Remove > SparkUnknown Expression to Index Server > - [CARBONDATA-3400 > <https://issues.apache.org/jira/browse/CARBONDATA-3400>] - Support > IndexSever for Spark-Shell for in secure KERBROSE mode > - [CARBONDATA-3403 > <https://issues.apache.org/jira/browse/CARBONDATA-3403>] - MV is not > working for like and filter AND and OR queries > - [CARBONDATA-3405 > <https://issues.apache.org/jira/browse/CARBONDATA-3405>] - SDK reader > getSplits() must clear the cache. > - [CARBONDATA-3406 > <https://issues.apache.org/jira/browse/CARBONDATA-3406>] - Support > Binary, Boolean,Varchar, Complex data types read and Dictionary columns > read > - [CARBONDATA-3407 > <https://issues.apache.org/jira/browse/CARBONDATA-3407>] - distinct, > count, Sum query fails when MV is created on single projection column > - [CARBONDATA-3416 > <https://issues.apache.org/jira/browse/CARBONDATA-3416>] - When new > analyzer rule added in spark, not reflecting in carbon > - [CARBONDATA-3417 > <https://issues.apache.org/jira/browse/CARBONDATA-3417>] - Load time > degrade for Range column due to cores configured > - [CARBONDATA-3418 > <https://issues.apache.org/jira/browse/CARBONDATA-3418>] - Inherit > Column Compressor Property from parent table to its child table's > - [CARBONDATA-3419 > <https://issues.apache.org/jira/browse/CARBONDATA-3419>] - Desc > Formatted not showing Range Column > - [CARBONDATA-3424 > <https://issues.apache.org/jira/browse/CARBONDATA-3424>] - There are > improper exception when query with avg(substr(binary data type)). > - [CARBONDATA-3426 > <https://issues.apache.org/jira/browse/CARBONDATA-3426>] - Fix Load > performance degrade by fixing task distribution > - [CARBONDATA-3429 > <https://issues.apache.org/jira/browse/CARBONDATA-3429>] - CarbonCli on > wrong segment path wrong error message is displayed > - [CARBONDATA-3432 > <https://issues.apache.org/jira/browse/CARBONDATA-3432>] - Range Column > compaction sending all the splits to all the executors one by one > - [CARBONDATA-3433 > <https://issues.apache.org/jira/browse/CARBONDATA-3433>] - MV has > issues > when create on constant column, dupicate columns and limit queries > - [CARBONDATA-3436 > <https://issues.apache.org/jira/browse/CARBONDATA-3436>] - update pre > insert into rule as per spark > - [CARBONDATA-3437 > <https://issues.apache.org/jira/browse/CARBONDATA-3437>] - Map > Implementation not correct > - [CARBONDATA-3442 > <https://issues.apache.org/jira/browse/CARBONDATA-3442>] - Fix creating > mv datamap with column name having length more than 128 > - [CARBONDATA-3453 > <https://issues.apache.org/jira/browse/CARBONDATA-3453>] - Set segment > doesn't work with adaptive execution > - [CARBONDATA-3455 > <https://issues.apache.org/jira/browse/CARBONDATA-3455>] - Job Group ID > is not displayed in the IndexServer > - [CARBONDATA-3456 > <https://issues.apache.org/jira/browse/CARBONDATA-3456>] - Fix > DataLaoding on MV when Yarn-Application is killed > - [CARBONDATA-3457 > <https://issues.apache.org/jira/browse/CARBONDATA-3457>] - [MV]Fix > Column not found with Cast Expression > - [CARBONDATA-3458 > <https://issues.apache.org/jira/browse/CARBONDATA-3458>] - Running > load, > insert , CTAS command on carbon table sets double Execution ID info, > and ID > of CTAS is null > - [CARBONDATA-3460 > <https://issues.apache.org/jira/browse/CARBONDATA-3460>] - EOF > exception > is thrown when quering using index server > - [CARBONDATA-3467 > <https://issues.apache.org/jira/browse/CARBONDATA-3467>] - Fix count(*) > with filter on string value > - [CARBONDATA-3474 > <https://issues.apache.org/jira/browse/CARBONDATA-3474>] - Fix validate > mvQuery having filter expression and correct error message > - [CARBONDATA-3476 > <https://issues.apache.org/jira/browse/CARBONDATA-3476>] - Read time > and > scan time stats shown wrong in executor log for filter query > - [CARBONDATA-3477 > <https://issues.apache.org/jira/browse/CARBONDATA-3477>] - Throw out > exception when use sql: 'update table select\n...' > - [CARBONDATA-3478 > <https://issues.apache.org/jira/browse/CARBONDATA-3478>] - Fix > ArrayIndexOutOfBoundsException issue on compaction after alter rename > operation > - [CARBONDATA-3481 > <https://issues.apache.org/jira/browse/CARBONDATA-3481>] - Multi-thread > pruning fails when datamaps count is just near numOfThreadsForPruning > - [CARBONDATA-3482 > <https://issues.apache.org/jira/browse/CARBONDATA-3482>] - Null pointer > exception when concurrent select queries are executed from different > beeline terminals. > - [CARBONDATA-3483 > <https://issues.apache.org/jira/browse/CARBONDATA-3483>] - Can not run > horizontal compaction when execute update sql > - [CARBONDATA-3486 > <https://issues.apache.org/jira/browse/CARBONDATA-3486>] - > Serialization/ deserialization issue with Datatype > - [CARBONDATA-3490 > <https://issues.apache.org/jira/browse/CARBONDATA-3490>] - Concurrent > data load failure with carbondata FileNotFound exception > - [CARBONDATA-3493 > <https://issues.apache.org/jira/browse/CARBONDATA-3493>] - Carbon query > fails when enable.query.statistics is true in specific scenario. > > New Feature > > - [CARBONDATA-3404 > <https://issues.apache.org/jira/browse/CARBONDATA-3404>] - Support > CarbonFile API for coniguring custom file systems > > Improvement > > - [CARBONDATA-3309 > <https://issues.apache.org/jira/browse/CARBONDATA-3309>] - MV datamap > adapt to spark 2.1 version > - [CARBONDATA-3365 > <https://issues.apache.org/jira/browse/CARBONDATA-3365>] - Support > Apache arrow vector filling from carbondata SDK > - [CARBONDATA-3447 > <https://issues.apache.org/jira/browse/CARBONDATA-3447>] - Index Server > Performance Improvement > - [CARBONDATA-3488 > <https://issues.apache.org/jira/browse/CARBONDATA-3488>] - Check the > file size after move local file to carbon path > > > Regards > Raghunandan >
