Similar comment regarding the file format specification. It looks like this is derived from the Parquet file format.
Which is fine as long as we follow the terms of the license: https://github.com/apache/parquet-format/blob/master/LICENSE#L101 (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and For example CarbonData: https://github.com/HuaweiBigData/carbondata/wiki/CarbonData-File-Structure-and-Format https://github.com/HuaweiBigData/carbondata/blob/master/format/src/main/thrift/carbondata.thrift Parquet: https://github.com/apache/parquet-format/blob/master/README.md https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift On Thu, May 19, 2016 at 3:11 PM, Julian Hyde <jh...@apache.org> wrote: > I see code derived from Mondrian in the org.carbondata.core.carbon > package[1] (I’m familiar with Mondrian’s code structure because I wrote > it). Mondrian was originally EPL and as such cannot be re-licensed under > ASL. Everything is probably fine, but as part of incubation, we will need > to make sure that this and other code has a clear progeny. > > Julian > > [1] > https://github.com/HuaweiBigData/carbondata/tree/master/core/src/main/java/org/carbondata/core/carbon > < > https://github.com/HuaweiBigData/carbondata/tree/master/core/src/main/java/org/carbondata/core/carbon > > > > > On May 19, 2016, at 10:04 AM, Liang Chen <chenliang...@huawei.com> > wrote: > > > > Hi Lars > > > > Thanks for you participated in discussion. > > > > Based on the below requirements, we investigated existing file formats in > > the Hadoop eco-system, but we could not find a suitable solution that > > satisfying requirements all at the same time, so we start designing > > CarbonData. > > R1.Support big scan & only fetch a few columns > > R2.Support primary key lookup response in sub-second. > > R3.Support interactive OLAP-style query over big data which involve many > > filters in a query, this type of workload should response in seconds. > > R4.Support fast individual record extraction which fetch all columns of > the > > record. > > R5.Support HDFS so that customer can leverage existing Hadoop cluster. > > > > When we investigate Parquet/ORC, it seems they work very well for R1 and > R5, > > but they does not meet for R2,R3,R4. So we designed CarbonData mainly to > add > > following differentiating features: > > > > 1.Stores data along with index: it can significantly accelerate query > > performance and reduces the I/O scans and CPU resources, where there are > > filters in the query. CarbonData index is consisted of multiple level, a > > processing framework can leverage this index to reduce the task it needs > to > > schedule and process, and it can also do skip scan in more finer grain > unit > > (called blocklet) in task side scanning instead of scanning the whole > file. > > > > 2.Operable encoded data :Through supporting efficient compression and > global > > encoding schemes, can query on compressed/encoded data, the data can be > > converted just before returning the results to the users, which is "late > > materialized". > > > > 3.Column group: Allow multiple columns form a column group to store as > row > > format, thus cost of column reconstructing is reduced. > > > > 4.Supports for various use cases with one single Data format : like > > interactive OLAP-style query, Sequential Access (big scan), Random Access > > (narrow scan). > > > > Please kindly let me know if the above info answer your questions. > > > > Regards > > Liang > > > > > > > > > > > > > > -- > > View this message in context: > http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-CarbonData-incubation-proposal-tp49643p49652.html > > Sent from the Apache Incubator - General mailing list archive at > Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > -- Julien