Re: Data types

Paul Rogers Thu, 26 Jan 2017 23:12:44 -0800

Looks like I gave you advice that as a bit off. The function you want is either:


            this.buffer = fragmentContext.getManagedBuffer();

The above allocates a 256 byte buffer. You can initially allocate a larger one:

            this.buffer = fragmentContext.getManagedBuffer(4096);

Or, to reallocate:

           buffer = fragmentContext.replace(buffer, 8192);

Again, I’ve not used these method myself, but they seem they might do the trick.

- Paul

> On Jan 26, 2017, at 9:51 PM, Charles Givre <[email protected]> wrote:
> 
> Thanks!  I’m hoping to submit a PR eventually once I have this all done.  I 
> tried your changes and now I’m getting this error:
> 
> 0: jdbc:drill:zk=local> select * from dfs.client.`small.misolog`;
> Error: DATA_READ ERROR: Tried to remove unmanaged buffer.
> 
> Fragment 0:0
> 
> [Error Id: 52fc846a-1d94-4300-bcb4-7000d0949b3c on 
> charless-mbp-2.fios-router.home:31010] (state=,code=0)
> 
> 
> 
> 
>> On Jan 26, 2017, at 23:08, Paul Rogers <[email protected]> wrote:
>> 
>> Hi Charles,
>> 
>> Very cool plugin!
>> 
>> My knowledge in this area is a bit sketchy… That said, the problem appears 
>> to be that the code does not extend the Drillbuf to ensure it has sufficient 
>> capacity. Try calling this method: reallocIfNeeded, something like this:
>> 
>>      this.buffer.reallocIfNeeded(stringLength);
>>      this.buffer.setBytes(0, bytes, 0, stringLength);
>>      map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>> 
>> Then, comment out the 256 length hack and see if it works.
>> 
>> To avoid memory fragmentation, maybe change your loop as:
>> 
>>           int maxRecords = MAX_RECORDS_PER_BATCH;
>>           int maxWidth = 256;
>>           while(recordCount < maxRecords &&(line = this.reader.readLine()) 
>> != null){
>>           …
>>              if(stringLength > maxWidth) {
>>                 maxWidth = stringLength;
>>                 maxRecords = 16 * 1024 * 1024 / maxWidth;
>>              }
>> 
>> The above is not perfect (the last record added might be much larger than 
>> the others, causing the corresponding vector to grow larger than 16 MB, but 
>> the occasional large vector should be OK.)
>> 
>> Thanks,
>> 
>> - Paul
>> 
>> On Jan 26, 2017, at 5:31 PM, Charles Givre 
>> <[email protected]<mailto:[email protected]>> wrote:
>> 
>> Hi Paul,
>> Would you mind taking a look at my code?  I’m wondering if I’m doing this 
>> correctly.  Just for context, I’m working on a generic log file reader for 
>> drill (https://github.com/cgivre/drill-logfile-plugin 
>> <https://github.com/cgivre/drill-logfile-plugin>), and I encountered some 
>> errors when working with fields that were > 256 characters long.  It isn’t a 
>> storage plugin, but it extends the EasyFormatPlugin.
>> 
>> I added some code to truncate the strings to 256 chars, it worked.  Before 
>> this it was throwing errors as shown below:
>> 
>> 
>> 
>> Error: DATA_READ ERROR: index: 0, length: 430 (expected: range(0, 256))
>> 
>> Fragment 0:0
>> 
>> [Error Id: b2250326-f983-440c-a73c-4ef4a6cf3898 on 
>> charless-mbp-2.fios-router.home:31010] (state=,code=0)
>> 
>> 
>> The query that generated this was just a SELECT * FROM dfs.`file`.  Also, 
>> how do I set the size of each row batch?
>> Thank you for your help.
>> — C
>> 
>> 
>> if (m.find()) {
>>  for( int i = 1; i <= m.groupCount(); i++ )
>>  {
>>      //TODO Add option for date fields
>>      String fieldName  = fieldNames.get(i - 1);
>>      String fieldValue;
>> 
>>      fieldValue = m.group(i);
>> 
>>      if( fieldValue == null){
>>          fieldValue = "";
>>      }
>>      byte[] bytes = fieldValue.getBytes("UTF-8");
>> 
>> //Added this and it worked….
>>      int stringLength = bytes.length;
>>      if( stringLength > 256 ){
>>          stringLength = 256;
>>      }
>> 
>>      this.buffer.setBytes(0, bytes, 0, stringLength);
>>      map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>>  }
>> 
>> 
>> 
>> 
>> On Jan 26, 2017, at 20:20, Paul Rogers 
>> <[email protected]<mailto:[email protected]>> wrote:
>> 
>> Hi Charles,
>> 
>> The Varchar column can hold any length of data. We’ve recently been working 
>> on tests that have columns up to 8K in length.
>> 
>> The one caveat is that, when working with data larger than 256 bytes, you 
>> must be extremely careful in your reader. The out-of-box text reader will 
>> always read 64K rows. This (due to various issues) can cause memory 
>> fragmentation and OOM errors when used with columns greater than 256 bytes 
>> in width.
>> 
>> If you are developing your own storage plugin, then adjust the size of each 
>> row batch so that no single vector is larger than 16 MB in size. Then you 
>> can use any size of column.
>> 
>> Suppose your logs contain text lines up to, say, 1K in size. This means that 
>> each record batch your reader produces must be of size less than 16 MB / 1K 
>> / row = 1600 rows (rather than the usual 64K.)
>> 
>> Once the data is in the Varchar column, the rest of Drill should “just work” 
>> on that data.
>> 
>> - Paul
>> 
>> On Jan 26, 2017, at 4:11 PM, Charles Givre 
>> <[email protected]<mailto:[email protected]>> wrote:
>> 
>> I’m working on a plugin to read log files and the data has some long 
>> strings.  Is there a data type that can hold strings longer than 256 
>> characters?
>> Thanks,
>> — Charles
>> 
>> 
>> 
>

Re: Data types

Reply via email to