Hi, Am I right that you basically fear that if you are allowing users to "manually" modify DataSet<Row>'s that you're loosing control of the types etc.?
I think that integrating the expression API into the existing API is nicer, because it gives users more flexibility. It should also lead to a lower overall complexity, right? I'm in favor of keeping things as simple as possible. Robert On Thu, Jan 29, 2015 at 4:47 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > I have to decide whether to expose the implementation or hide it from > the user and would like to hear some opinions about that. > > The expression operations operate on DataSet[Row], where Row is > basically a wrapper for an array of elements of different types. The > expression API system keeps tracks of the names and types of these > fields. Right now, when you have an operation like: > > // 'foo and 'bar are Scala symbols > // they refer to fields named foo and > // bar in the input data set > val result = in.select('foo, 'bar) > > the result is a DataSet[Row]. This means two things: > > 1. The user can theoretically to a map > operation on this where he manually accesses row fields, as in: > > in.map { row => (row.getField(0).asInstanceOf[Int], > row.getField(1).asInstanceOf[String]) } > > 2. I cannot easily look at the whole structure of a query. Because > queries are translated to DataSet > operations one expression at a time, i.e.: > > val result = in1.join(in2).filter(...).select(...) > > results in a join operation, followed by a filter operation, followed > by a map operation. If the translation would not happen one operator > at-a-time, we could combine all the operations into one join > operation. This would mean having a custom optimiser component for the > expression API and bypassing the optimiser component we have for > normal operator data flows. > > The question is now. Should I expose it as is, i.e. let expression > operations result in DataSet[Row], or should I hide it behind another > type of DataSet (ExpressionDataSet) so that we can later-on change the > implementation details and perform any magic we want behind the > scenes. > > Cheers, > Aljoscha >