paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-541440586
 
 
   Let's continue to assume that `DICT` is, essentially, `DICT<KEY_TYPE, 
VALUE_TYPE>` and that we can think of the `DICT`, when writing, as a pair of 
arrays: one for keys, one for values. (Because vectors are write-once, we have 
to add (key, value) pairs one by one. If so, then we need a new form of writer, 
a `DictWriter` that has semantics such as:
   
   ```
     ObjectWriter key();
     ObjectWriter value();
     void save();
   ```
   
   The `key()` gives us an object writer that lets us access the key writer. If 
we restrict keys to be scalars, then we can just do:
   
   ```
     ScalarWriter key();
   ```
   
   Values can, I imagine, be of any type. So, the `ObjectWriter` lets us work 
with them generically. Suppose we had a `DICT<VARCHAR,DOUBLE>`, we could do:
   
   ```
      DictWriter dictWriter = rowWriter.dict("myDict");
      dictWriter.key().setString("fred");
      dictWriter.value().scalar().setDouble(123.45)
      dictWriter.save();
      dictWriter.key().setString("barney");
      dictWriter.value().scalar().setDouble(98.76)
      dictWriter.save();
   ```
   
   This means that, in the `ObjectWriter()`, we need to add a new method, 
`dict()`, which will return a `DictWriter`, and we need to define the 
`DictWriter` interface.
   
   There is quite a bit of commentary in the column accessor package that 
(tries) to explain the structure behind these writers. Perhaps that might help 
explain the ideas here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to