Re: OrderBy Year and Month is not displaying correctly

2020-01-06 Thread Mich Talebzadeh
*The distinct transformation does not preserve order, you need to distinct
first, then orderby.*

Thanks Enrico. You are correct. Worked fine!

joint_accounts.
  select(year(col("transactiondate")).as("Year")
, month(col("transactiondate")).as("Month")
, sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per
Month")
, sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per
Month")).
  withColumn(("incoming Per Month"),format_number(col("incoming Per
Month"),2)).
  withColumn(("outgoing Per Month"),format_number(col("outgoing Per
Month"),2)).


* distinct.  orderBy(col("Year"),col("Month")). *
show(1000,false)

++-+--+--+
|Year|Month|incoming Per Month|outgoing Per Month|
++-+--+--+
|2019|9|13,958.58 |17,920.31 |
|2019|10   |10,029.00 |10,067.52 |

|2019|11   |4,032.30  |4,225.30  |
|2019|12   |742.00|814.49|
|2020|1|1,570.00  |1,540.86  |
++-+--+--+

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 6 Jan 2020 at 09:35, Gourav Sengupta 
wrote:

> or just use SQL, which is less verbose, easily readable, and takes care of
> all such scenarios. But for some weird reason I have found that people
> using data frame API's have a perception that using SQL is less
> intelligent. But I think that using less effort to get better output can me
> a measure of intelligence.
>
> Regards,
> Gourav Sengupta
>
> On Mon, Jan 6, 2020 at 9:23 AM Enrico Minack 
> wrote:
>
>> The distinct transformation does not preserve order, you need to distinct
>> first, then orderby.
>>
>> Enrico
>>
>>
>> Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
>>
>> Hi,
>>
>> I am working out monthly outgoing etc from an account and I am using the
>> following code
>>
>> import org.apache.spark.sql.expressions.Window
>> val wSpec =
>> Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))
>> joint_accounts.
>>   select(year(col("transactiondate")).as("Year")
>> , month(col("transactiondate")).as("Month")
>> , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per
>> Month")
>> , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per
>> Month")).
>>
>> *orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
>> distinct.
>> show(1000,false)
>>
>> This shows as follows:
>>
>>
>> |Year|Month|incoming Per Month|outgoing Per Month|
>> ++-+--+--+
>> |2019|9|13958.58  |17920.31  |
>> |2019|11   |4032.30   |4225.30   |
>> |2020|1|1530.00   |1426.91   |
>> |2019|10   |10029.00  |10067.52  |
>> |2019|12   |742.00|814.49|
>> ++-+--+--+
>>
>>  however the orderby is not correct as I expect to see 2010 record and
>> 2019 records in the order of year and month.
>>
>> Any suggestions?
>>
>> Thanks,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>


Re: OrderBy Year and Month is not displaying correctly

2020-01-06 Thread Gourav Sengupta
or just use SQL, which is less verbose, easily readable, and takes care of
all such scenarios. But for some weird reason I have found that people
using data frame API's have a perception that using SQL is less
intelligent. But I think that using less effort to get better output can me
a measure of intelligence.

Regards,
Gourav Sengupta

On Mon, Jan 6, 2020 at 9:23 AM Enrico Minack  wrote:

> The distinct transformation does not preserve order, you need to distinct
> first, then orderby.
>
> Enrico
>
>
> Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
>
> Hi,
>
> I am working out monthly outgoing etc from an account and I am using the
> following code
>
> import org.apache.spark.sql.expressions.Window
> val wSpec =
> Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))
> joint_accounts.
>   select(year(col("transactiondate")).as("Year")
> , month(col("transactiondate")).as("Month")
> , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per
> Month")
> , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per
> Month")).
> *orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
> distinct.
> show(1000,false)
>
> This shows as follows:
>
>
> |Year|Month|incoming Per Month|outgoing Per Month|
> ++-+--+--+
> |2019|9|13958.58  |17920.31  |
> |2019|11   |4032.30   |4225.30   |
> |2020|1|1530.00   |1426.91   |
> |2019|10   |10029.00  |10067.52  |
> |2019|12   |742.00|814.49|
> ++-+--+--+
>
>  however the orderby is not correct as I expect to see 2010 record and
> 2019 records in the order of year and month.
>
> Any suggestions?
>
> Thanks,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>


Re: OrderBy Year and Month is not displaying correctly

2020-01-06 Thread Enrico Minack
The distinct transformation does not preserve order, you need to 
distinct first, then orderby.


Enrico


Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:

Hi,

I am working out monthly outgoing etc from an account and I am using 
the following code


import org.apache.spark.sql.expressions.Window
val wSpec = 
Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))

joint_accounts.
  select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming 
Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing 
Per Month")).

*orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
++-+--+--+
|2019|9    |13958.58  |17920.31  |
|2019|11   |4032.30   |4225.30   |
|2020|1    |1530.00   |1426.91   |
|2019|10   |10029.00  |10067.52  |
|2019|12   |742.00    |814.49    |
++-+--+--+

 however the orderby is not correct as I expect to see 2010 record and 
2019 records in the order of year and month.


Any suggestions?

Thanks,

Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.






OrderBy Year and Month is not displaying correctly

2020-01-05 Thread Mich Talebzadeh
Hi,

I am working out monthly outgoing etc from an account and I am using the
following code

import org.apache.spark.sql.expressions.Window
val wSpec =
Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))
joint_accounts.
  select(year(col("transactiondate")).as("Year")
, month(col("transactiondate")).as("Month")
, sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per
Month")
, sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per
Month")).
*orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
distinct.
show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
++-+--+--+
|2019|9|13958.58  |17920.31  |
|2019|11   |4032.30   |4225.30   |
|2020|1|1530.00   |1426.91   |
|2019|10   |10029.00  |10067.52  |
|2019|12   |742.00|814.49|
++-+--+--+

 however the orderby is not correct as I expect to see 2010 record and 2019
records in the order of year and month.

Any suggestions?

Thanks,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.