(Reposting an answer sent earlier for those whose e-mail filters discard Github-related e-mails…)
Drill is an analytic engine optimized for numbers and short strings. At present, Drill’s practical limit on string (i.e. Varchar) length is 256 characters or less. Drill is vectorized. Drill tries to create “batches” of data with up to 64K records. When individual columns are wider than 256 (on average) our vectors grow larger than 16 MB in size and we run into memory issues. If VarChar columns are 64K in size (the current maximum), we hit the vector limit with only 256 records. Unfortunately, at present, our readers and operators don’t know how to limit their batch sizes to such a low number (though we are actively working on a fix.) Thanks, - Paul > On Aug 29, 2017, at 6:27 PM, César Tenganán <[email protected]> wrote: > > Hi, > > > We are working with a dataset with columns of large Strings on it, and we > are having the error "UNSUPPORTED_OPERATION ERROR: Trying to write > something big in a column columnIndex 0 Limit 65536 Failure while reading > file ..." > > > There is a validation on FieldVarCharOutput.java > <https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/FieldVarCharOutput.java> > using > MAX_FIELD_LENGTH > > Why this validation use this specific value MAX_FIELD_LENGTH = 1024 * 64? > Is possible to handle large Strings like as used to represent wkt > Geometries? > > Regards! > > -- > Julio César Tenganán Daza > Ingeniero En Sistemas > Universidad Del Valle
