Add a +1 from me as well. Just managed to finish going over it. Thanks Bobby for leading this effort !
Regards, Mridul On Wed, May 29, 2019 at 2:51 PM Tom Graves <tgraves...@yahoo.com.invalid> wrote: > > Ok, I'm going to call this vote and send the result email. We had 9 +1's (4 > binding) and 1 +0 and no -1's. > > Tom > > On Monday, May 27, 2019, 3:25:14 PM CDT, Felix Cheung > <felixcheun...@hotmail.com> wrote: > > > +1 > > I’d prefer to see more of the end goal and how that could be achieved (such > as ETL or SPARK-24579). However given the rounds and months of discussions we > have come down to just the public API. > > If the community thinks a new set of public API is maintainable, I don’t see > any problem with that. > > ________________________________ > From: Tom Graves <tgraves...@yahoo.com.INVALID> > Sent: Sunday, May 26, 2019 8:22:59 AM > To: hol...@pigscanfly.ca; Reynold Xin > Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei > Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev > Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar > Processing Support > > More feedback would be great, this has been open a long time though, let's > extend til Wednesday the 29th and see where we are at. > > Tom > > > > Sent from Yahoo Mail on Android > > On Sat, May 25, 2019 at 6:28 PM, Holden Karau > <hol...@pigscanfly.ca> wrote: > Same I meant to catch up after kubecon but had some unexpected travels. > > On Sat, May 25, 2019 at 10:56 PM Reynold Xin <r...@databricks.com> wrote: > > Can we push this to June 1st? I have been meaning to read it but > unfortunately keeps traveling... > > On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > > +1 > > Thanks, > Dongjoon. > > On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote: > > +1 on exposing the APIs for columnar processing support. > > I understand that the scope of this SPIP doesn't cover AI / ML > use-cases. But I saw a good performance gain when I converted data > from rows to columns to leverage on SIMD architectures in a POC ML > application. > > With the exposed columnar processing support, I can imagine that the > heavy lifting parts of ML applications (such as computing the > objective functions) can be written as columnar expressions that > leverage on SIMD architectures to get a good speedup. > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote: > > > > It would allow for the columnar processing to be extended through the > > shuffle. So if I were doing say an FPGA accelerated extension it could > > replace the ShuffleExechangeExec with one that can take a ColumnarBatch as > > input instead of a Row. The extended version of the ShuffleExchangeExec > > could then do the partitioning on the incoming batch and instead of > > producing a ShuffleRowRDD for the exchange they could produce something > > like a ShuffleBatchRDD that would let the serializing and deserializing > > happen in a column based format for a faster exchange, assuming that > > columnar processing is also happening after the exchange. This is just like > > providing a columnar version of any other catalyst operator, except in this > > case it is a bit more complex of an operator. > > > > On Wed, May 15, 2019 at 12:15 PM Imran Rashid > > <iras...@cloudera.com.invalid> wrote: > >> > >> sorry I am late to the discussion here -- the jira mentions using this > >> extensions for dealing with shuffles, can you explain that part? I don't > >> see how you would use this to change shuffle behavior at all. > >> > >> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> wrote: > >>> > >>> Thanks for replying, I'll extend the vote til May 26th to allow your > >>> and other people feedback who haven't had time to look at it. > >>> > >>> Tom > >>> > >>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >>> > > >>> > I’d like to ask this vote period to be extended, I’m interested but I > >>> > don’t have the cycles to review it in detail and make an informed vote > >>> > until the 25th. > >>> > > >>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> > >>> > wrote: > >>> >> > >>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't > >>> >> feel strongly about it. I would still suggest doing the following: > >>> >> > >>> >> 1. Link the POC mentioned in Q4. So people can verify the POC result. > >>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick > >>> >> check. Beside ColumnarBatch and ColumnarVector, we also need to make > >>> >> the following public. People who are familiar with SQL internals > >>> >> should help assess the risk. > >>> >> * ColumnarArray > >>> >> * ColumnarMap > >>> >> * unsafe.types.CaledarInterval > >>> >> * ColumnarRow > >>> >> * UTF8String > >>> >> * ArrayData > >>> >> * ... > >>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't match > >>> >> the purpose of this SPIP. It does make some code cleaner. But I guess > >>> >> for ETL use cases, it won't bring much value. > >>> >> > >>> > -- > >>> > Twitter: https://twitter.com/holdenkarau > >>> > Books (Learning Spark, High Performance Spark, etc.): > >>> > https://amzn.to/2MaRAG9 > >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org