Re: Aggregations with scala pairs

2016-08-18 Thread Andrés Ivaldi
Thanks!!!

On Thu, Aug 18, 2016 at 3:35 AM, Jean-Baptiste Onofré 
wrote:

> Agreed.
>
> Regards
> JB
> On Aug 18, 2016, at 07:32, Olivier Girardot  com> wrote:
>>
>> CC'ing dev list,
>> you should open a Jira and a PR related to it to discuss it c.f.
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#
>> ContributingtoSpark-ContributingCodeChanges
>>
>>
>>
>> On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote:
>>
>>> Hello, I'd like to report a wrong behavior of DataSet's API, I don´t
>>> know how I can do that. My Jira account doesn't allow me to add a Issue
>>>
>>> I'm using Apache 2.0.0 but the problem came since at least version 1.4
>>> (given the doc since 1.3)
>>>
>>> The problem is simple to reporduce, also the work arround, if we apply
>>> agg over a DataSet with scala pairs over the same column, only one agg over
>>> that column is actualy used, this is because the toMap that reduce the pair
>>> values of the mane key to one and overwriting the value
>>>
>>> class
>>> https://github.com/apache/spark/blob/master/sql/core/
>>> src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
>>>
>>>
>>>  def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>>> DataFrame = {
>>> agg((aggExpr +: aggExprs).toMap)
>>>   }
>>>
>>>
>>> rewrited as somthing like this should work
>>>  def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>>> DataFrame = {
>>>toDF((aggExpr +: aggExprs).map { pairExpr =>
>>>   strToExpr(pairExpr._2)(df(pairExpr._1).expr)
>>> }.toSeq)
>>> }
>>>
>>>
>>> regards
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>> *Olivier Girardot*   | Associé
>> o.girar...@lateral-thoughts.com
>> +33 6 24 09 17 94
>>
>


-- 
Ing. Ivaldi Andres


Re: Aggregations with scala pairs

2016-08-18 Thread Jean-Baptiste Onofré
Agreed.

Regards
JB



On Aug 18, 2016, 07:32, at 07:32, Olivier Girardot 
 wrote:
>CC'ing dev list, you should open a Jira and a PR related to it to
>discuss it c.f.
>https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges
>
>
>
>
>
>On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote:
>Hello, I'd like to report a wrong behavior of DataSet's API, I don´t
>know how I
>can do that. My Jira account doesn't allow me to add a Issue
>I'm using Apache 2.0.0 but the problem came since at least version 1.4
>(given
>the doc since 1.3)
>The problem is simple to reporduce, also the work arround, if we apply
>agg over
>a DataSet with scala pairs over the same column, only one agg over that
>column
>is actualy used, this is because the toMap that reduce the pair values
>of the
>mane key to one and overwriting the value
>class 
>https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
>
>
>def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>DataFrame = {
>agg((aggExpr +: aggExprs).toMap)
>}
>rewrited as somthing like this should work def agg(aggExpr: (String,
>String), aggExprs: (String, String)*): DataFrame = {
>toDF((aggExpr +: aggExprs).map { pairExpr =>
>strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) }
>
>regards --
>Ing. Ivaldi Andres
>
>
>Olivier Girardot | Associé
>o.girar...@lateral-thoughts.com
>+33 6 24 09 17 94


Re: Aggregations with scala pairs

2016-08-18 Thread Olivier Girardot
CC'ing dev list, you should open a Jira and a PR related to it to discuss it 
c.f.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges





On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote:
Hello, I'd like to report a wrong behavior of DataSet's API, I don´t know how I
can do that. My Jira account doesn't allow me to add a Issue
I'm using Apache 2.0.0 but the problem came since at least version 1.4 (given
the doc since 1.3)
The problem is simple to reporduce, also the work arround, if we apply agg over
a DataSet with scala pairs over the same column, only one agg over that column
is actualy used, this is because the toMap that reduce the pair values of the
mane key to one and overwriting the value
class 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala


def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame = {
agg((aggExpr +: aggExprs).toMap)
}
rewrited as somthing like this should work def agg(aggExpr: (String, String), 
aggExprs: (String, String)*): DataFrame = {
toDF((aggExpr +: aggExprs).map { pairExpr => 
strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) }

regards --
Ing. Ivaldi Andres


Olivier Girardot | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94

Aggregations with scala pairs

2016-08-17 Thread Andrés Ivaldi
Hello, I'd like to report a wrong behavior of DataSet's API, I don´t know
how I can do that. My Jira account doesn't allow me to add a Issue

I'm using Apache 2.0.0 but the problem came since at least version 1.4
(given the doc since 1.3)

The problem is simple to reporduce, also the work arround, if we apply agg
over a DataSet with scala pairs over the same column, only one agg over
that column is actualy used, this is because the toMap that reduce the pair
values of the mane key to one and overwriting the value

class
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala


 def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
> = {
> agg((aggExpr +: aggExprs).toMap)
>   }


rewrited as somthing like this should work
 def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
= {
   toDF((aggExpr +: aggExprs).map { pairExpr =>
  strToExpr(pairExpr._2)(df(pairExpr._1).expr)
}.toSeq)
}


regards
-- 
Ing. Ivaldi Andres