Stefán - I don't think you are being unreasonable. I think the topics you've brought up are valid, and have seriously tempered my willingness to use avro more in my organization. Since we haven't committed to it yet, I think that makes it less of a priority, thus I am less vocal, but I would imagine the Drill community should take note of this. A "supported" ( https://drill.apache.org/docs/querying-a-file-system-introduction/) file format certainly should be given more care than this. It sets a bad precedent and may scare off users.
While for you, the option of removing Avro from supported file formats would be a kick in the pants, I think the Drill project should consider whether the ramifications of stating something is supported but having very poor support. This is a huge issue for a project, it doesn't elicit trust, it frustrates users like yourself, and for users who are exploring the project it may turn them away. I think a rational discussion on this topic, with an outcome being decided upon (not left open) is very important to the Drill project as a whole, and I applaud your tenacity in bringing up these issues. Is it possible for you to join the weekly hangouts? It would be good to talk things out there. John On Fri, Apr 1, 2016 at 7:43 AM, Stefán Baxter <[email protected]> wrote: > Hi, > > Is it at all possible that we are the only company trying to use Avro with > Drill to some serious extent? > > We continue to coma across all sorts of embarrassing shortcomings like the > one we are dealing with now where a schema change exception is thrown even > when working with a single Avro file (that has the same schema). > > Can a non project member call for a discussion on this topic and the level > of support that is offered for Avro in Drill? > > My discussion topics would be: > > - Strange schema validation that ... : > ... currently fails on single file > ... prevents dirX variables to work > ... would require Drill to scan all Avro files to establish schema (even > when pruning would be used) > ... would ALWAY fail for old queries if the an old Avro file, containing > the original fields, was removed and could not be scanned > ... does not rhyme with the "eliminate ETL" and "Evolving Schema" goals > of Drill > > - Simple union types do not work to declare nullable fields > > - Drill can not read Parquet that is created by parquet-mr-avro > > - What is the intention for Avro in Drill > - Should we select to use some other format to buffer/badge data before > creating a Parquet file for it? > > - The culture here regarding talking about boring/hard topics like this > - Where serious complaints/issues are met with silence > - I know full well that my frustration shines through here and that it > not helping but this Drill+Avro mess is really getting too much for us > to > handle > > Look forward do discuss this here or during the next hangout. > > Regards, > -Stefán (or ... mr. old & frustrated) >
