Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Mridul Muralidharan Wed, 29 May 2019 15:40:56 -0700

Add a +1 from me as well.
Just managed to finish going over it.

Thanks Bobby for leading this effort !


Regards,
Mridul

On Wed, May 29, 2019 at 2:51 PM Tom Graves <tgraves...@yahoo.com.invalid> wrote:
>
> Ok, I'm going to call this vote and send the result email. We had 9 +1's (4 
> binding) and 1 +0 and no -1's.
>
> Tom
>
> On Monday, May 27, 2019, 3:25:14 PM CDT, Felix Cheung 
> <felixcheun...@hotmail.com> wrote:
>
>
> +1
>
> I’d prefer to see more of the end goal and how that could be achieved (such 
> as ETL or SPARK-24579). However given the rounds and months of discussions we 
> have come down to just the public API.
>
> If the community thinks a new set of public API is maintainable, I don’t see 
> any problem with that.
>
> ________________________________
> From: Tom Graves <tgraves...@yahoo.com.INVALID>
> Sent: Sunday, May 26, 2019 8:22:59 AM
> To: hol...@pigscanfly.ca; Reynold Xin
> Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei 
> Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev
> Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar 
> Processing Support
>
> More feedback would be great, this has been open a long time though, let's 
> extend til Wednesday the 29th and see where we are at.
>
> Tom
>
>
>
> Sent from Yahoo Mail on Android
>
> On Sat, May 25, 2019 at 6:28 PM, Holden Karau
> <hol...@pigscanfly.ca> wrote:
> Same I meant to catch up after kubecon but had some unexpected travels.
>
> On Sat, May 25, 2019 at 10:56 PM Reynold Xin <r...@databricks.com> wrote:
>
> Can we push this to June 1st? I have been meaning to read it but 
> unfortunately keeps traveling...
>
> On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
>
> +1
>
> Thanks,
> Dongjoon.
>
> On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote:
>
> +1 on exposing the APIs for columnar processing support.
>
> I understand that the scope of this SPIP doesn't cover AI / ML
> use-cases. But I saw a good performance gain when I converted data
> from rows to columns to leverage on SIMD architectures in a POC ML
> application.
>
> With the exposed columnar processing support, I can imagine that the
> heavy lifting parts of ML applications (such as computing the
> objective functions) can be written as columnar expressions that
> leverage on SIMD architectures to get a good speedup.
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Web: https://www.dbtsai.com
> PGP Key ID: 42E5B25A8F7A82C1
>
> On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote:
> >
> > It would allow for the columnar processing to be extended through the 
> > shuffle.  So if I were doing say an FPGA accelerated extension it could 
> > replace the ShuffleExechangeExec with one that can take a ColumnarBatch as 
> > input instead of a Row. The extended version of the ShuffleExchangeExec 
> > could then do the partitioning on the incoming batch and instead of 
> > producing a ShuffleRowRDD for the exchange they could produce something 
> > like a ShuffleBatchRDD that would let the serializing and deserializing 
> > happen in a column based format for a faster exchange, assuming that 
> > columnar processing is also happening after the exchange. This is just like 
> > providing a columnar version of any other catalyst operator, except in this 
> > case it is a bit more complex of an operator.
> >
> > On Wed, May 15, 2019 at 12:15 PM Imran Rashid 
> > <iras...@cloudera.com.invalid> wrote:
> >>
> >> sorry I am late to the discussion here -- the jira mentions using this 
> >> extensions for dealing with shuffles, can you explain that part?  I don't 
> >> see how you would use this to change shuffle behavior at all.
> >>
> >> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> wrote:
> >>>
> >>> Thanks for replying, I'll extend the vote til May 26th to allow your
> >>> and other people feedback who haven't had time to look at it.
> >>>
> >>> Tom
> >>>
> >>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> wrote:
> >>> >
> >>> > I’d like to ask this vote period to be extended, I’m interested but I 
> >>> > don’t have the cycles to review it in detail and make an informed vote 
> >>> > until the 25th.
> >>> >
> >>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> 
> >>> > wrote:
> >>> >>
> >>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't 
> >>> >> feel strongly about it. I would still suggest doing the following:
> >>> >>
> >>> >> 1. Link the POC mentioned in Q4. So people can verify the POC result.
> >>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick 
> >>> >> check. Beside ColumnarBatch and ColumnarVector, we also need to make 
> >>> >> the following public. People who are familiar with SQL internals 
> >>> >> should help assess the risk.
> >>> >> * ColumnarArray
> >>> >> * ColumnarMap
> >>> >> * unsafe.types.CaledarInterval
> >>> >> * ColumnarRow
> >>> >> * UTF8String
> >>> >> * ArrayData
> >>> >> * ...
> >>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't match 
> >>> >> the purpose of this SPIP. It does make some code cleaner. But I guess 
> >>> >> for ETL use cases, it won't bring much value.
> >>> >>
> >>> > --
> >>> > Twitter: https://twitter.com/holdenkarau
> >>> > Books (Learning Spark, High Performance Spark, etc.): 
> >>> > https://amzn.to/2MaRAG9
> >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Reply via email to