Re: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

Ryan Blue Mon, 10 Dec 2018 09:47:10 -0800

Anyone can attend the v2 sync. You just need to let me know what email
address you'd like to have added. Sorry it is invite-only. That's a
limitation of the platform (hangouts), the Spark community welcomes anyone
that wants to participate.


On Mon, Dec 10, 2018 at 1:00 AM JOAQUIN GUANTER GONZALBEZ <
joaquin.guantergonzal...@telefonica.com> wrote:

> Ah, yes, you are right. The DataSourceV2 APIs wouldn’t let an implementor
> mark a DataSet as “bucketed”. Is there any documentation about the upcoming
> table support for data source v2 or any way of getting invited to the
> DataSourceV2 community sync?
>
>
>
> Thanks!
>
> Ximo.
>
>
>
> *De:* Wenchen Fan <cloud0...@gmail.com>
> *Enviado el:* miércoles, 5 de diciembre de 2018 15:51
> *Para:* JOAQUIN GUANTER GONZALBEZ <joaquin.guantergonzal...@telefonica.com
> >
> *CC:* Spark dev list <dev@spark.apache.org>
> *Asunto:* Re: [SPARK-26160] Make assertNotBucketed call in
> DataFrameWriter::save optional
>
>
>
> The bucket feature is designed to only work with data sources with table
> support, and currently the table support is not public yet, which means no
> external data sources can access bucketing information right now. The
> bucket feature only works with Spark native file source tables.
>
>
>
> We are working on adding table support to data source v2, and we should
> have a good story about bucket when it's done.
>
>
>
> On Tue, Nov 27, 2018 at 1:01 AM JOAQUIN GUANTER GONZALBEZ <
> joaquin.guantergonzal...@telefonica.com> wrote:
>
> Hello,
>
>
>
> I have a proposal for a small improvement in the Datasource API and I’d
> like to know if it sounds like a change the Spark project would accept.
>
>
>
> Currently, the `.save` method in DataFrameWriter will fail if the
> dataframe is bucketed and/or sorted. This makes sense, since there is no
> way of storing metadata in the current file-based data sources to know
> whether a file was bucketed or not.
>
>
>
> I have a use case where I would like to implement a new, file-based data
> source which could keep track of that kind of metadata (without using the
> HiveMetastore), so I would like to be able to `.save` bucketed dataframes.
>
>
>
> Would a patch to extend the datasource api with an indicator of whether
> that source is able to serialize bucketed dataframes be a welcome addition?
> I'm happy to work on it if that’s the case.
>
>
>
> I have opened this as https://issues.apache.org/jira/browse/SPARK-26160
> in the Spark Jira.
>
>
>
> Cheers,
>
> Ximo.
>
>
> ------------------------------
>
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
> ------------------------------
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

Reply via email to