Thanks! I’m hoping to submit a PR eventually once I have this all done. I tried your changes and now I’m getting this error:
0: jdbc:drill:zk=local> select * from dfs.client.`small.misolog`; Error: DATA_READ ERROR: Tried to remove unmanaged buffer. Fragment 0:0 [Error Id: 52fc846a-1d94-4300-bcb4-7000d0949b3c on charless-mbp-2.fios-router.home:31010] (state=,code=0) > On Jan 26, 2017, at 23:08, Paul Rogers <[email protected]> wrote: > > Hi Charles, > > Very cool plugin! > > My knowledge in this area is a bit sketchy… That said, the problem appears to > be that the code does not extend the Drillbuf to ensure it has sufficient > capacity. Try calling this method: reallocIfNeeded, something like this: > > this.buffer.reallocIfNeeded(stringLength); > this.buffer.setBytes(0, bytes, 0, stringLength); > map.varChar(fieldName).writeVarChar(0, stringLength, buffer); > > Then, comment out the 256 length hack and see if it works. > > To avoid memory fragmentation, maybe change your loop as: > > int maxRecords = MAX_RECORDS_PER_BATCH; > int maxWidth = 256; > while(recordCount < maxRecords &&(line = this.reader.readLine()) > != null){ > … > if(stringLength > maxWidth) { > maxWidth = stringLength; > maxRecords = 16 * 1024 * 1024 / maxWidth; > } > > The above is not perfect (the last record added might be much larger than the > others, causing the corresponding vector to grow larger than 16 MB, but the > occasional large vector should be OK.) > > Thanks, > > - Paul > > On Jan 26, 2017, at 5:31 PM, Charles Givre > <[email protected]<mailto:[email protected]>> wrote: > > Hi Paul, > Would you mind taking a look at my code? I’m wondering if I’m doing this > correctly. Just for context, I’m working on a generic log file reader for > drill (https://github.com/cgivre/drill-logfile-plugin > <https://github.com/cgivre/drill-logfile-plugin>), and I encountered some > errors when working with fields that were > 256 characters long. It isn’t a > storage plugin, but it extends the EasyFormatPlugin. > > I added some code to truncate the strings to 256 chars, it worked. Before > this it was throwing errors as shown below: > > > > Error: DATA_READ ERROR: index: 0, length: 430 (expected: range(0, 256)) > > Fragment 0:0 > > [Error Id: b2250326-f983-440c-a73c-4ef4a6cf3898 on > charless-mbp-2.fios-router.home:31010] (state=,code=0) > > > The query that generated this was just a SELECT * FROM dfs.`file`. Also, how > do I set the size of each row batch? > Thank you for your help. > — C > > > if (m.find()) { > for( int i = 1; i <= m.groupCount(); i++ ) > { > //TODO Add option for date fields > String fieldName = fieldNames.get(i - 1); > String fieldValue; > > fieldValue = m.group(i); > > if( fieldValue == null){ > fieldValue = ""; > } > byte[] bytes = fieldValue.getBytes("UTF-8"); > > //Added this and it worked…. > int stringLength = bytes.length; > if( stringLength > 256 ){ > stringLength = 256; > } > > this.buffer.setBytes(0, bytes, 0, stringLength); > map.varChar(fieldName).writeVarChar(0, stringLength, buffer); > } > > > > > On Jan 26, 2017, at 20:20, Paul Rogers > <[email protected]<mailto:[email protected]>> wrote: > > Hi Charles, > > The Varchar column can hold any length of data. We’ve recently been working > on tests that have columns up to 8K in length. > > The one caveat is that, when working with data larger than 256 bytes, you > must be extremely careful in your reader. The out-of-box text reader will > always read 64K rows. This (due to various issues) can cause memory > fragmentation and OOM errors when used with columns greater than 256 bytes in > width. > > If you are developing your own storage plugin, then adjust the size of each > row batch so that no single vector is larger than 16 MB in size. Then you can > use any size of column. > > Suppose your logs contain text lines up to, say, 1K in size. This means that > each record batch your reader produces must be of size less than 16 MB / 1K / > row = 1600 rows (rather than the usual 64K.) > > Once the data is in the Varchar column, the rest of Drill should “just work” > on that data. > > - Paul > > On Jan 26, 2017, at 4:11 PM, Charles Givre > <[email protected]<mailto:[email protected]>> wrote: > > I’m working on a plugin to read log files and the data has some long strings. > Is there a data type that can hold strings longer than 256 characters? > Thanks, > — Charles > > >
