Re: [Structured Streaming] Multiple sources best practice/recommendation

2017-09-14 Thread Michael Armbrust
I would probably suggest that you partition by format (though you can get
the file name from the build in function input_file_name()).  You can load
multiple streams from different directories and union them together as long
as the schema is the same after parsing.  Otherwise you can just run
multiple streams on the same cluster.

On Wed, Sep 13, 2017 at 7:56 AM, JG Perrin  wrote:

> Hi,
>
>
>
> I have different files being dumped on S3, I want to ingest them and join
> them.
>
>
>
> What does sound better to you? Have one “ directory” for all or one per
> file format?
>
>
>
> If I have one directory for all, can you get some metadata about the file,
> like its name?
>
>
>
> If multiple directory, how can I have multiple “listeners”?
>
>
>
> Thanks
>
>
>
> jg
> --
>
> This electronic transmission and any documents accompanying this
> electronic transmission contain confidential information belonging to the
> sender. This information may contain confidential health information that
> is legally privileged. The information is intended only for the use of the
> individual or entity named above. The authorized recipient of this
> transmission is prohibited from disclosing this information to any other
> party unless required to do so by law or regulation and is required to
> delete or destroy the information after its stated need has been fulfilled.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or the taking of any action in reliance
> on or regarding the contents of this electronically transmitted information
> is strictly prohibited. If you have received this E-mail in error, please
> notify the sender and delete this message immediately.
>


[Structured Streaming] Multiple sources best practice/recommendation

2017-09-13 Thread JG Perrin
Hi,

I have different files being dumped on S3, I want to ingest them and join them.

What does sound better to you? Have one " directory" for all or one per file 
format?

If I have one directory for all, can you get some metadata about the file, like 
its name?

If multiple directory, how can I have multiple "listeners"?

Thanks

jg

__
This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender.  This 
information may contain confidential health information that is legally 
privileged.  The information is intended only for the use of the individual or 
entity named above.  The authorized recipient of this transmission is 
prohibited from disclosing this information to any other party unless required 
to do so by law or regulation and is required to delete or destroy the 
information after its stated need has been fulfilled.  If you are not the 
intended recipient, you are hereby notified that any disclosure, copying, 
distribution or the taking of any action in reliance on or regarding the 
contents of this electronically transmitted information is strictly prohibited. 
 If you have received this E-mail in error, please notify the sender and delete 
this message immediately.