Looks like I gave you advice that as a bit off. The function you want is either:
this.buffer = fragmentContext.getManagedBuffer();
The above allocates a 256 byte buffer. You can initially allocate a larger one:
this.buffer = fragmentContext.getManagedBuffer(4096);
Or, to reallocate:
buffer = fragmentContext.replace(buffer, 8192);
Again, I’ve not used these method myself, but they seem they might do the trick.
- Paul
> On Jan 26, 2017, at 9:51 PM, Charles Givre <[email protected]> wrote:
>
> Thanks! I’m hoping to submit a PR eventually once I have this all done. I
> tried your changes and now I’m getting this error:
>
> 0: jdbc:drill:zk=local> select * from dfs.client.`small.misolog`;
> Error: DATA_READ ERROR: Tried to remove unmanaged buffer.
>
> Fragment 0:0
>
> [Error Id: 52fc846a-1d94-4300-bcb4-7000d0949b3c on
> charless-mbp-2.fios-router.home:31010] (state=,code=0)
>
>
>
>
>> On Jan 26, 2017, at 23:08, Paul Rogers <[email protected]> wrote:
>>
>> Hi Charles,
>>
>> Very cool plugin!
>>
>> My knowledge in this area is a bit sketchy… That said, the problem appears
>> to be that the code does not extend the Drillbuf to ensure it has sufficient
>> capacity. Try calling this method: reallocIfNeeded, something like this:
>>
>> this.buffer.reallocIfNeeded(stringLength);
>> this.buffer.setBytes(0, bytes, 0, stringLength);
>> map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>>
>> Then, comment out the 256 length hack and see if it works.
>>
>> To avoid memory fragmentation, maybe change your loop as:
>>
>> int maxRecords = MAX_RECORDS_PER_BATCH;
>> int maxWidth = 256;
>> while(recordCount < maxRecords &&(line = this.reader.readLine())
>> != null){
>> …
>> if(stringLength > maxWidth) {
>> maxWidth = stringLength;
>> maxRecords = 16 * 1024 * 1024 / maxWidth;
>> }
>>
>> The above is not perfect (the last record added might be much larger than
>> the others, causing the corresponding vector to grow larger than 16 MB, but
>> the occasional large vector should be OK.)
>>
>> Thanks,
>>
>> - Paul
>>
>> On Jan 26, 2017, at 5:31 PM, Charles Givre
>> <[email protected]<mailto:[email protected]>> wrote:
>>
>> Hi Paul,
>> Would you mind taking a look at my code? I’m wondering if I’m doing this
>> correctly. Just for context, I’m working on a generic log file reader for
>> drill (https://github.com/cgivre/drill-logfile-plugin
>> <https://github.com/cgivre/drill-logfile-plugin>), and I encountered some
>> errors when working with fields that were > 256 characters long. It isn’t a
>> storage plugin, but it extends the EasyFormatPlugin.
>>
>> I added some code to truncate the strings to 256 chars, it worked. Before
>> this it was throwing errors as shown below:
>>
>>
>>
>> Error: DATA_READ ERROR: index: 0, length: 430 (expected: range(0, 256))
>>
>> Fragment 0:0
>>
>> [Error Id: b2250326-f983-440c-a73c-4ef4a6cf3898 on
>> charless-mbp-2.fios-router.home:31010] (state=,code=0)
>>
>>
>> The query that generated this was just a SELECT * FROM dfs.`file`. Also,
>> how do I set the size of each row batch?
>> Thank you for your help.
>> — C
>>
>>
>> if (m.find()) {
>> for( int i = 1; i <= m.groupCount(); i++ )
>> {
>> //TODO Add option for date fields
>> String fieldName = fieldNames.get(i - 1);
>> String fieldValue;
>>
>> fieldValue = m.group(i);
>>
>> if( fieldValue == null){
>> fieldValue = "";
>> }
>> byte[] bytes = fieldValue.getBytes("UTF-8");
>>
>> //Added this and it worked….
>> int stringLength = bytes.length;
>> if( stringLength > 256 ){
>> stringLength = 256;
>> }
>>
>> this.buffer.setBytes(0, bytes, 0, stringLength);
>> map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>> }
>>
>>
>>
>>
>> On Jan 26, 2017, at 20:20, Paul Rogers
>> <[email protected]<mailto:[email protected]>> wrote:
>>
>> Hi Charles,
>>
>> The Varchar column can hold any length of data. We’ve recently been working
>> on tests that have columns up to 8K in length.
>>
>> The one caveat is that, when working with data larger than 256 bytes, you
>> must be extremely careful in your reader. The out-of-box text reader will
>> always read 64K rows. This (due to various issues) can cause memory
>> fragmentation and OOM errors when used with columns greater than 256 bytes
>> in width.
>>
>> If you are developing your own storage plugin, then adjust the size of each
>> row batch so that no single vector is larger than 16 MB in size. Then you
>> can use any size of column.
>>
>> Suppose your logs contain text lines up to, say, 1K in size. This means that
>> each record batch your reader produces must be of size less than 16 MB / 1K
>> / row = 1600 rows (rather than the usual 64K.)
>>
>> Once the data is in the Varchar column, the rest of Drill should “just work”
>> on that data.
>>
>> - Paul
>>
>> On Jan 26, 2017, at 4:11 PM, Charles Givre
>> <[email protected]<mailto:[email protected]>> wrote:
>>
>> I’m working on a plugin to read log files and the data has some long
>> strings. Is there a data type that can hold strings longer than 256
>> characters?
>> Thanks,
>> — Charles
>>
>>
>>
>