Hi Vimal, Design doc looks clear, can you also add file format storage design for map datatype.
Regards, Ravi. On 17 October 2016 at 07:43, Liang Chen <chenliang6...@gmail.com> wrote: > Hi Vimal > > Thank you started the discussion. > For keys of Map data only can be primitive, can you list these type which > will be supported? (Int,String,Double.. > > For discussing more conveniently, you can go ahead to use google docs. > After the design document finalized , please archive and upload it to > cwiki:https://cwiki.apache.org/confluence/display/ > CARBONDATA/CarbonData+Home > > Regards > Liang > > > Vimal Das Kammath wrote > > Hi All, > > > > This discussion is regarding support for Map Data type in Carbon Data. > > > > Carbon Data supports complex and nested data types such as Arrays and > > Struts. However, Carbon Data does not support other complex data types > > such > > as Maps and Union which are generally supported by popular opensource > file > > formats. > > > > > > Supporting Map data type will require changes/additions to the DDL, Query > > Syntax, Data Loading and Storage. > > > > > > I have hosted the design on google docs for review and discussion. > > > > https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9 > q5-oKMqzMMQHYY/edit?usp=sharing > > > > > > Below is the same inline. > > > > > > 1. DDL Changes > > > > Maps are key->value data types and where the value can be fetched by > > providing the key. Hence we need to restrict keys to primitive data types > > whereas values can be of any data type supported in Carbon(primitive and > > complex). > > > > Map data types can be defined in the create table DDL as :- > > > > “MAP<primitive_data_type, data_type>” > > > > For Example:- > > > > create table example_table (id Int, name String, salary Int, > > salary_breakup > > map<String, Int>, city String) > > > > > > 2. Data Loading Changes > > > > Carbon should be able to support loading data into tables with Map type > > columns from csv files. It should be possible to represent maps in a > > single > > row of csv. This will need carbon to support specifying the delimiters > for > > :- > > > > 1. Between two Key-Value pairs > > > > 2. Between each Key and Value in a pair > > > > As Carbon already supports Strut and Array Complex types, the data > loading > > process already provides support for defining delimiters for complex data > > types. Carbon provides two Optional parameters for data loading > > > > 1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two > > Key-Value pairs > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') > > > > 2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each > > Key and Value in a pair > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') > > > > With these delimiter options, the below map can be represented in csv:- > > > > Fixed->100,000 > > > > Bonus->30,000 > > > > Stock->40,000 > > > > As > > > > Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file. > > > > > > > > 3. Query Capabilities > > > > A complex datatype like Map will require additional operators to be > > supported in the query language to fully utilize the strength of the data > > type. > > > > Maps are sequence of key-value pairs, hence should support looking up > > value > > for a given key. Users could use the ColumnName[“key”] syntax to lookup > > values in a map column. For example: salary_breakup[“Fixed”] could be > used > > to fetch only the Fixed component in the salary breakup. > > > > In Addition, we also need to define how maps can be used in existing > > constructs such as select, where(filter), group by etc.. > > 1. Select:- Map data type can be directly selected or only the value > > for a given key can be selected as per the requirement. For > > example:-“Select > > name, salary, salary_breakup” will return the content of map long with > > each > > row.“Select name, salary, salary_breakup[“Fixed”]” will return only one > > value from the map whose key is “Fixed”2. Filter:-Map data type > cannot > > be directly used in a where clause as where clause can operate only on > > primitive data types. However the map lookup operator can be used in > where > > clauses. For example:-“Select name, salary where > > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive > > type, further assessor operators need to be used depending on the type of > > value to arrive at a primitive type for the filter expression to be > > valid.* > > 3. Group By:- Just like with filters, maps cannot be directly used in > > a > > group by clause, however the lookup operator can be used. > > > > 4. Functions:- A size() function can be provided for map types to > > determine the number of key-value pairs in a map. > > 4. Storage changes > > > > As Carbon is a columnar data store, Map values will be stored using 3 > > physical columns > > > > 1. One Column for representing the Map Data type. Will store the > > number > > of fields and start index, just the same way as it is done for Struts and > > Arrays. > > > > 2. One Column for the Key > > > > 3. One Column for the value, if the value is of primitive data type, > > else the value itself will be multiple physical columns depending on the > > data type of the value. > > > > Map<String,Int> > > > > Column_1 > > > > Column_2 > > > > Column_3 > > > > Map_Salary_Breakup > > > > Map_Salary_Breakup.key > > > > Map_Salary_Breakup.value > > > > 3,1 > > > > Fixed > > > > 1,00,000 > > > > Bonus > > > > 30,000 > > > > Stock > > > > 40,000 > > > > 2,4 > > > > Fixed > > > > 1,40,000 > > > > Bonus > > > > 30,000 > > > > 3,6 > > > > Fixed > > > > 1,20,000 > > > > Bonus > > > > 20,000 > > > > Stock > > > > 30,000 > > > > Regards > > Vimal > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Discussion-New- > feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi