Hi Jacky,

1. Yes. It is better to keep all sorting logic to one step so other types
of sorts can be implemented easily. I will update the design.

2. EncoderProcessorStep can do dictionary encoding and converting
nodictionary and complex types to byte[] representation.
    Here encoding interface is flexible for user to give different encoding
representation at row level only.
    And about RLE, DELTA and also heavy compression are done at
DataWriterProcessorStep only, it is because these
 encodings/compression happens at bloclklet level not row level.

3. Yes, each step requires schema definition, that will be passed as
DataField[] through configuration to initial step InputProcessorStep .
Remaining steps can call child.getOutput() to get the schema. Here
each DataField
represents one column.


On 12 October 2016 at 09:38, Jacky Li <jacky.li...@qq.com> wrote:

> Hi Ravindra,
> Regarding the design
> (https://drive.google.com/file/d/0B4TWTVbFSTnqTF85anlDOUQ5S1BqY
> zFpLWcwZnBLSVVqSWpj/view),
> I have following question:
> 1. In SortProcessorStep, I think it is better to include MergeSort in this
> step also, so it includes all logic for sorting. In this case, developer
> can
> implement a external sort (spill to files only if necessary), then the
> loading process is a on-line sorting if memory is sufficient. I think it
> will improve loading performance a lot.
> 2. In EncoderProcessorStep, apart from the dictionary encoding, what other
> processing it will do? How about delta, RLE, etc.
> 3. In InputProcessorStep, it needs some schema definition to parse the
> input
> and convert to the row, right? For example, how to read from JSON, AVRO
> file?
> Regards,
> Jacky
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-
> regrading-design-of-data-load-after-kettle-removal-tp1672p1783.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.

Thanks & Regards,

Reply via email to