Thanks Wes,

With that in mind, I’m searching for a public API that returns MAX length value 
for ByteArray columns.  Can you point me to an example?

-Brian

On 9/12/19, 5:34 PM, "Wes McKinney" <wesmck...@gmail.com> wrote:

    EXTERNAL
    
    The memory references returned by ReadBatch are not guaranteed to
    persist from one function call to the next. So you need to copy the
    ByteArray data into your own data structures before calling ReadBatch
    again.
    
    Column readers for different columns are independent from each other.
    So function calls for column 7 should not affect anything having to do
    with column 4.
    
    On Thu, Sep 12, 2019 at 4:29 PM Brian Bowman <brian.bow...@sas.com> wrote:
    >
    > All,
    >
    > I’m debugging a low-level API Parquet reader case where the table has 
DOUBLE, BYTE_ARRAY, and FIXED_LENGTH_BYTE_ARRAY types.
    >
    > Four of the columns (ordinally 3, 4, 7, 9) are of type BYTE_ARRAY.
    >
    > In the following ReadBatch(), rowsToRead is already set to all rows in 
the Row Group.  The quantity is verified by the return value in values_read.
    >
    >       
byte_array_reader->ReadBatch(rowsToRead,nullptr,nullptr,rowColPtr,&values_read);
    >
    > Column 4 is dictionary encoded.  Upon return from its ReadBatch() call,  
the result vector of BYTE_ARRAY descriptors (rolColPtr) has  correct len/ptr 
pairs pointing into a decoded dictionary string – although not from the 
original dictionary vaues in the .parquet file being read.
    >
    > As soon as the the ReadBatch()  call is made for the next BYTE_ARRAY 
column (#7), a new DICTIONARY_PAGE is read and the BYTE_ARRAY descriptor values 
for column 4 are trashed.
    >
    > Is this expected behavior or a bug?  If expected, then it seems the 
dictionary values for Column 4 (… or any BYTE_ARRAY column that is 
dictionary-compressed) should be copied and the descriptor vector addresses 
back-patched, BEFORE invoking ReadBatch() again.  Is this the case?
    >
    > Thanks for clarifying,
    >
    >
    > -Brian
    >
    >
    >
    >
    

Reply via email to