Re: Contribution question

Bertty Contreras Sun, 02 Jan 2022 16:28:46 -0800

The main is idea of wayang is to provide a layer that pick the best
combination of platform to process a query, you can see the details on the
paper rheemix[1]

 Then providing a SQL-API will allow to transform a query into different
operators of wayang that will allow optimization with platform that only
have SQL like postgres with platforms that don’t SQL lenguaje like giraph.

The idea to use calcite, is coming from the intermediate representation
that calcite generates that will allows us to create the wayang plan with
an “udf” that are translateble again to SQL or translatable to a executable
code that can be executed by flink, as an example.

Imagen the query that it said something like:

Select A.a,A.b,A.c from A join A.a = X.a ….

Then X(10TB) is on HDFS and A(100MB) is on postgres, then the plan to
execute will something like:

Select A.a from A(1MB), this file is small then you can do broadcast and
filter using flink.

Then the join results are just 2 records, the wayang will perform the query
on postgres using the 2 record as condition.

But also could occurs that the join answer is 1TB, in that case, the data
of postgres will be move to HDFS and the all the rest of the process will
be on using flink.

Currently the optimizer is taking the decision of what platform will be
used depending on the amount of data to process and data movement. Then the
SQL-API will provide an way of “freedom” the decisions because we will have
all the intermediate representation to performs changes.

After we have the SQL-API we will be adding platforms that just support and
SQL ;), as you said.

The idea of using the intermediate representation it maybe sound weird to
you, but we can have a meeting to explain you better, then you can
understand better the full concept and also give us your feedback, let me
if hyou are available and when and I will freedom my schedule for it ;).
I’m in Germany just to you figure if we have some timezone differences ;).

Best regards,
Bertty

[1]
https://wayang.apache.org/assets/pdf/paper/journal_vldb.pdf

On Sun 2. Jan 2022 at 17:43, kamalesh palanisamy <[email protected]>
wrote:

> Hi Bertty,
> Thank you for the information! I would love to work on adding the SQL API
> for Wayang. Basically, now I need to add a new platform for the
> wayang-platforms that supports SQL through apache calcite? Am I right?
> Please do correct me if I am wrong.
>
> Thanks,
> Kamalesh P
>
>
> On Sun, Jan 2, 2022 at 3:36 AM Bertty Contreras <[email protected]>
> wrote:
>
>> Hi Kamalesh,
>>
>> Currently, Apache Wayang(Incubating) has the issues listed in Jira [1].
>> One feature that the community didn't have time to work on is the SQL API
>> for Apache Wayang(Incubating) [2]; the main idea is to use Apache Calcite
>> [3] as the parser of the SQL and then do something like Spark adapter of
>> calcite [4]. If you want to contribute to this feature, it will be so
>> awesome :D.
>>
>> If you found another issue interesting, let me know, or even if you have
>> some idea of a feature will be so awesome too :D
>>
>> Best regards,
>> Bertty
>>
>> [1] https://issues.apache.org/jira/projects/WAYANG
>> [2]
>> https://issues.apache.org/jira/projects/WAYANG/issues/WAYANG-25?filter=allopenissues
>> [3] https://calcite.apache.org
>> [4] https://github.com/apache/calcite/tree/master/spark
>>
>> On Sun, Jan 2, 2022 at 6:50 AM kamalesh palanisamy <[email protected]>
>> wrote:
>>
>>> Hi,
>>> My name is Kamalesh and I am currently looking to contribute to the
>>> project, but I couldn't find any proper issues. Can you help me with any
>>> features you would like me to contribute to?. Thanks!
>>> Thanks,
>>> Kamalesh P
>>>
>>

Re: Contribution question

Reply via email to