Forgot to mention that one particularly pesky issue we found in the work on
Redshift is to be able to write unit tests on this.

Is there an embedded version of SnowFlake to run those. I would like also if
possible to get some ideas on how to test this use case.

Also we should probably ensure that the FileIO part is generic enough so we
can
use S3 too because users can be using Snowflake in AWS too.


On Tue, Mar 24, 2020 at 10:10 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> Great !
> It seems this pattern (COPY + parallel file read) is becoming a standard
> for
> 'data warehouses' we are using something similar too in the AWS Redshift
> PR (WIP)
> for details: https://github.com/apache/beam/pull/10206
>
> Maybe worth for all of us to check and se eif we can converge the
> implementations as
> much as possible to provide users a consistent experience.
>
>
> On Tue, Mar 24, 2020 at 10:02 AM Elias Djurfeldt <
> elias.djurfe...@mirado.com> wrote:
>
>> Awesome job! I'm very interested in the cross-language support.
>>
>> Cheers,
>>
>> On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath <chamik...@google.com>
>> wrote:
>>
>>> Sounds great. Looks like operation of the Snowflake source will be
>>> similar to BigQuery source (export files to GCS and read files). This will
>>> allow you to better parallelize reading (current JDBC source is limited to
>>> one worker when reading).
>>>
>>> Seems like you already support initial splitting using files -
>>> https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374
>>> Prob. also consider supporting dynamic work rebalancing when runners
>>> support this through SDF.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>
>>>
>>> On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko <
>>> aromanenko....@gmail.com> wrote:
>>>
>>>> Great! This is always welcomed to have more IOs in Beam. I’d be happy
>>>> to take look on your PR once it will be created.
>>>>
>>>> Just a couple of questions for now.
>>>>
>>>> 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do
>>>> you plan to compare a performance between this SnowflakeIO and Beam JdbcIO?
>>>> 2) Are you going to support staging in other locations, like S3 and
>>>> Azure?
>>>> 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?
>>>>
>>>> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk <ka.kucharc...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Me and my colleagues have developed a new Java connector for Snowflake
>>>> that we would like to add to Beam.
>>>>
>>>> Snowflake is an analytic data warehouse provided as
>>>> Software-as-a-Service (SaaS). It uses a new SQL database engine with a
>>>> unique architecture designed for the cloud. To read more details please
>>>> check [1] and [2].
>>>>
>>>> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are
>>>> batch write and batch read that use the Snowflake COPY [4] operation
>>>> underneath. In both cases ParDo IOs load files on a stage and then they are
>>>> inserted into the Snowflake table of choice using the COPY API. The
>>>> currently supported stage is Google Cloud Storage[5].
>>>>
>>>> The schema how Snowflake Read IO works (write operation works similarly
>>>> but in opposite direction):
>>>> Here is an Apache Beam fork [6] with current work of the Snowflake IO.
>>>>
>>>> In the near future we would like to also add IO for writing streams
>>>> which will use SnowPipe - Snowflake mechanism for continuous loading[7].
>>>> Also, we would like to use cross language to provide Python connectors as
>>>> well.
>>>>
>>>> We are open for all opinions and suggestions. In case of any
>>>> questions/comments please do not hesitate to post them.
>>>>
>>>> In case of no objection I will create jira tickets and share them in
>>>> this thread. Cheers, Kasia
>>>>
>>>> [1] https://www.snowflake.com
>>>> [2]
>>>> https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html
>>>> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html
>>>> [4]
>>>> https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
>>>> [5]
>>>> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake
>>>>
>>>> [6] https://cloud.google.com/storage
>>>> [7]
>>>> https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html
>>>>
>>>>
>>>>
>>
>> --
>> Elias Djurfeldt
>> Mirado Consulting
>>
>

Reply via email to