Hi,

Am I right that you basically fear that if you are allowing users to
"manually" modify DataSet<Row>'s that you're loosing control of the types
etc.?

I think that integrating the expression API into the existing API is nicer,
because it gives users more flexibility. It should also lead to a lower
overall complexity, right? I'm in favor of keeping things as simple as
possible.

Robert

On Thu, Jan 29, 2015 at 4:47 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> I have to decide whether to expose the implementation or hide it from
> the user and would like to hear some opinions about that.
>
> The expression operations operate on DataSet[Row], where Row is
> basically a wrapper for an array of elements of different types. The
> expression API system keeps tracks of the names and types of these
> fields. Right now, when you have an operation like:
>
> // 'foo and 'bar are Scala symbols
> // they refer to fields named foo and
> // bar in the input data set
> val result = in.select('foo, 'bar)
>
> the result is a DataSet[Row]. This means two things:
>
> 1. The user can theoretically to a map
> operation on this where he manually accesses row fields, as in:
>
> in.map { row => (row.getField(0).asInstanceOf[Int],
> row.getField(1).asInstanceOf[String]) }
>
> 2. I cannot easily look at the whole structure of a query. Because
> queries are translated to DataSet
> operations one expression at a time, i.e.:
>
> val result = in1.join(in2).filter(...).select(...)
>
> results in a join operation, followed by a filter operation, followed
> by a map operation. If the translation would not happen one operator
> at-a-time, we could combine all the operations into one join
> operation. This would mean having a custom optimiser component for the
> expression API and bypassing the optimiser component we have for
> normal operator data flows.
>
> The question is now. Should I expose it as is, i.e. let expression
> operations result in DataSet[Row], or should I hide it behind another
> type of DataSet (ExpressionDataSet) so that we can later-on change the
> implementation details and perform any magic we want behind the
> scenes.
>
> Cheers,
> Aljoscha
>

Reply via email to