Yes, that's what I'm afraid about. I would also like to have the
flexibility to change the underlying implementation in the future, if
we want to. And no, hiding the implementation would actually make
things easier, implementation wise.

On Fri, Jan 30, 2015 at 7:08 PM, Robert Metzger <rmetz...@apache.org> wrote:
> Hi,
>
> Am I right that you basically fear that if you are allowing users to
> "manually" modify DataSet<Row>'s that you're loosing control of the types
> etc.?
>
> I think that integrating the expression API into the existing API is nicer,
> because it gives users more flexibility. It should also lead to a lower
> overall complexity, right? I'm in favor of keeping things as simple as
> possible.
>
> Robert
>
> On Thu, Jan 29, 2015 at 4:47 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Hi,
>> I have to decide whether to expose the implementation or hide it from
>> the user and would like to hear some opinions about that.
>>
>> The expression operations operate on DataSet[Row], where Row is
>> basically a wrapper for an array of elements of different types. The
>> expression API system keeps tracks of the names and types of these
>> fields. Right now, when you have an operation like:
>>
>> // 'foo and 'bar are Scala symbols
>> // they refer to fields named foo and
>> // bar in the input data set
>> val result = in.select('foo, 'bar)
>>
>> the result is a DataSet[Row]. This means two things:
>>
>> 1. The user can theoretically to a map
>> operation on this where he manually accesses row fields, as in:
>>
>> in.map { row => (row.getField(0).asInstanceOf[Int],
>> row.getField(1).asInstanceOf[String]) }
>>
>> 2. I cannot easily look at the whole structure of a query. Because
>> queries are translated to DataSet
>> operations one expression at a time, i.e.:
>>
>> val result = in1.join(in2).filter(...).select(...)
>>
>> results in a join operation, followed by a filter operation, followed
>> by a map operation. If the translation would not happen one operator
>> at-a-time, we could combine all the operations into one join
>> operation. This would mean having a custom optimiser component for the
>> expression API and bypassing the optimiser component we have for
>> normal operator data flows.
>>
>> The question is now. Should I expose it as is, i.e. let expression
>> operations result in DataSet[Row], or should I hide it behind another
>> type of DataSet (ExpressionDataSet) so that we can later-on change the
>> implementation details and perform any magic we want behind the
>> scenes.
>>
>> Cheers,
>> Aljoscha
>>

Reply via email to