The major reason for looking at row/column major differences involves that amount of transformation necessary to go from one format to the other. With row major formats we will have to do a transformation similar to providing results to an ODBC or JDBC system, which will expect them represented as individual records. This process is very simple, but it is also time-consuming, and many existing java programs/libraries are going to likely be using practices we will slow down drill to much, the biggest one I am concerned about right now is new object allocation, but I'm sure we will be able to find other inefficiencies as well.
For column major formats, we are likely to be able to write long runs of values form value vectors into the files. That being said there are some problems with value level compression like dictionary/ RLE and bit-packed encodings. There is also a consideration I forgot to mention in the document about the representation of nulls in the various formats. In drill we leave empty spaces for nulls in VVs because we want random access to values for fast pointer sorting. In storage formats, the primary concern is limiting the size of the data, without going too crazy on reading/ record re-assembly overhead. Most of the space efficient binary formats will likely leave nulls out (as is the case with parquet, and I believe ORC as well). While this will slow down the transformation a bit, as we will have to find runs of defined values to write all at once, and then skip each sequence of nulls we have in our VV and continue writing when we find more defined values, it will in many cases still be faster than pulling out individual integers or string out of a value vector and writing them through the row major interfaces provided by their libraries. -Jason On Fri, Oct 4, 2013 at 7:32 PM, Timothy Chen <[email protected]> wrote: > I see, didn't know we have plans to write results into various formats. If > we can do that it could integrate even with other data processing tools to > integrate with drill (which is probably the aim too?) > > So if we're just writing results into disk, I wonder why we need a writer > interface that needs to consider row/column major differences? > > Can't we just take the in-memory vv that is being produced and write a > Recordbatch at a time directly to formats we want? > > Tim > > > On Fri, Oct 4, 2013 at 11:24 AM, Jason Altekruse > <[email protected]>wrote: > > > Tim, > > > > Answers to your questions are below. I am almost always available after > 2pm > > your time, feel free to send me some dates/times that work for you. > > > > - Maybe a bit more context? A writer interface doesn't seem to suggest > what > > it really is about. Also if this is focused on writing (from record > reader > > into drill vv), why is there many comments around reading in your > > consideration? > > - I don't see any writer interface proposed? > > > > There actually isn't a writer interface written yet. The document I > shared > > are some thoughts I'm compiling about what the writer interface needs to > > handle. I hope to gather as much information about various formats before > > proposing a hard interface. I believe there could be a lot of value in > > trying to generalize the readers and writers, even across formats. I'm > > hoping it will minimize the burden of maintaining support for formats and > > they evolve, as well as update the readers and writers and the value > > vectors become more complex (compressed representations of data in > memory, > > dictionary encodings, etc.) > > > > The reader interface was included for reference in the document, because > I > > believe we should work on both the reader and writer together, as both > have > > many similar properties and really just perform a translation in opposite > > directions. > > > > For clarity the writer interface is what will allow us to enable a create > > table operation and store results to disk. Obviously we will want to > > support a variety of formats, as most users will likely want to export in > > formats they are used to working with, because Drill will likely not be > the > > only tool they use to analyze their data. > > > > As Drill is not designed for batch jobs, this really is not designed for > > converting large volumes of data between formats, because long running > > queries can die and are not recoverable in Drill. > > > > - Some of the considerations you're putting in columnar also applies to > row > > major as well: > > - compression (ie: Avro compresses per block). > > - schema changes can happen in both > > - What are we writing to disk? And why does columnar requires a larger > > in-memory structure to be written to disk? > > > > The compression in row major is definitely an important consideration > When > > this is the case we will have to buffer a large amount of records in > memory > > before writing to disk. With simple formats like csv we can really buffer > > as many or as few records in memory before actually writing. Likely > > buffering more will be better to prevent disk overhead. > > > > While schema changes can happen in both, we don't have to worry about it > > for writing values to disk, except for values with defined schemas per > > block. In a csv, it is completely possible to have additional columns in > > one of the rows (while the format is very limited you couldn't really > leave > > out a column without there being a problem). While the value vectors > would > > not handle a change in schema every value during reading, the reality is > > that this arrangement of data is unlikely to come out of drill because of > > the single schema per batch design. A fast change in schema every few > > records could only be represented by a series of very short batches, > > something we will try to avoid. > > > > This does speak to the consideration I brought up in the document about > how > > to handle frequent schema changes, as it might make sense to go back an > > re-write some data if we figure out that the next batch has an additional > > field. This type of a scenario would otherwise require us to start a new > > parquet file for example. > > > > - I don't quite get why row major requires additional objects to be > passed > > to writer? > > > > Many existing interfaces are written with JAVA conventions in mind, > object > > passing is common for representing a series of values in a single row. If > > we create new objects for each row we pass into their writing interface > > there would be a lot of object allocation and garbage collection. This is > > obviously something we want to avoid. > > > > When we are considering the reader interface, it is possible that an > > existing interface will pass us back a new object each time it reads a > > record. In some of these cases they might be making a new object each > time. > > We will want to go in and add new methods that allow for passing existing > > objects in and having the libraries for the various readers just populate > > them, this will also prevent excessive garbage collection. > > > > Hi Jason, > > > > Have some questions around your considerations: > > > > - Maybe a bit more context? A writer interface doesn't seem to suggest > what > > it really is about. Also if this is focused on writing (from record > reader > > into drill vv), why is there many comments around reading in your > > consideration? > > - I don't see any writer interface proposed? > > - Some of the considerations you're putting in columnar also applies to > row > > major as well: > > - compression (ie: Avro compresses per block). > > - schema changes can happen in both > > - What are we writing to disk? And why does columnar requires a larger > > in-memory structure to be written to disk? > > - I don't quite get why row major requires additional objects to be > passed > > to writer? > > > > Tim > > > > > > > > > > > > On Wed, Oct 2, 2013 at 11:19 AM, Jason Altekruse > > <[email protected]>wrote: > > > > > Please provide any feedback on the start of the writer interface > > described > > > in the attached document. It should be a more formalized interface in > the > > > next few days. > > > > > > -Jason > > > > > > ---------- Forwarded message ---------- > > > From: Jason Altekruse <[email protected]> > > > Date: Wed, Oct 2, 2013 at 1:12 PM > > > Subject: Fwd: Writer interface start > > > To: [email protected] > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > > From: Jason Altekruse <[email protected]> > > > Date: Wed, Oct 2, 2013 at 12:31 PM > > > Subject: Writer interface start > > > To: Jacques Nadeau <[email protected]>, Ben Becker < > > > [email protected]>, Steven Phillips <[email protected]> > > > > > > > > > A quick update on the status of the writer interface. I haven't written > > it > > > formally yet but I put together a document describing the important > > design > > > considerations for various formats, trying to be as general as > possible. > > > Should be more fleshed out in detail in the next few days. > > > > > > See attached > > > -Jason > > > > > > > > > > > >
