See below On Tue, Apr 16, 2013 at 11:30 AM, Timothy Chen <[email protected]> wrote:
> Hi Jacques, > > Want to ask some questions I forgot to bring up in the meetup: > > 1, Can you elaborate some items on the last slide what they are: > - Execution fragment format > A physical plan provides information about parallelization but doesn't actual node assignments. The execution engine is responsible for converting the physical plan into an execution plan which includes node level assignments. This is then broken into pieces where each particular node only gets there respsecitve piece. These are the execution fragments. > - Forman > This is a accidental mispelling of Foreman. The Foreman drives execution of one particular query. Dealing with bit level status messages, warnings errors and cancellation. > 2, The in-memory format that supports either ValueVector, RLE or Dict, I > assume RLE or Dict will be leveraging either Orc or Parquet right? > > Kind of. RLE and Dict are abstraction where a particular operator can take advantage of the nature of that encoding. Parquet and ORC are really container formats as opposed to field level formats. I believe both are going to support multiple internal encodings within the container (for example, Parquet uses RLE to manage repetition level storage and ORC has a dictionary coding capability Once we start to work through Dict and RLE, we could very likely leverage one of the encoding formats used within one of these systems. The hope would be that whatever we pick would be a cheap translation from/to either format if it isn't the exact same. > > Tim >
