Definitely +1, please feel free to create JIRA issue and PR Regards, Jacky
> 2019年12月20日 上午7:55,Ajantha Bhat <[email protected]> 写道: > > Currently carbondata "insert into" uses the CarbonLoadDataCommand itself. > Load process has steps like parsing and converter step with bad record > support. > Insert into doesn't require these steps as data is already validated and > converted from source table or dataframe. > > Some identified changes are below. > > 1. Need to refactor and separate load and insert at driver side to skip > converter step and unify flow for No sort and global sort insert. > 2. Need to avoid reorder of each row. By changing select dataframe's > projection order itself during the insert into. > 3. For carbon to carbon insert, need to provide the ReadSupport and use > RecordReader (vector reader currently doesn't support ReadSupport) to > handle null values, time stamp cutoff (direct dictionary) from scanRDD > result. > 4. Need to handle insert into partition/non-partition table in local sort, > global sort, no sort, range columns, compaction flow. > > The final goal is to improve insert performance by keeping only required > logic and also decrease the memory footprint. > > If you have any other suggestions or optimizations related to this let me > know. > > Thanks, > Ajantha
