Re: Spark SQL Table Name Resolution

Stuart Macdonald Wed, 15 Aug 2018 10:22:30 -0700

Here's the initial pull request for this issue, please review and let me
know your feedback. I had to combine the two approaches to enable this to
work for both standard .read() where we can add the schema option, and
catalog-based selects where we use schemaName.tableName. Happy to discuss
on a call if this isn't clear.


https://github.com/apache/ignite/pull/4551

On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald <stu...@stuwee.org> wrote:

> Hi Nikolay, yes would be happy to - will likely be early next week. I’ll
> go with the approach of adding a new optional field to the Spark data
> source provider unless there are any objections.
>
> Stuart.
>
> > On 9 Aug 2018, at 14:20, Nikolay Izhikov <nizhi...@apache.org> wrote:
> >
> > Stuart, do you want to work on this ticket?
> >
> > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> >> Thanks Val, here’s the ticket:
> >>
> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> >> <https://issues.apache.org/jira/projects/IGNITE/issues/
> IGNITE-9228?filter=allopenissues>
> >>
> >> (Thanks for correcting my terminology - I work mostly with the
> traditional
> >> CacheConfiguration interface where I believe each cache occupies its own
> >> schema.)
> >>
> >> Stuart.
> >>
> >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <
> valentin.kuliche...@gmail.com>
> >> wrote:
> >>
> >> Stuart,
> >>
> >> Two tables can have same names only if they are located in different
> >> schemas. Said that, sdding schema name support makes sense to me for
> sure.
> >> We can implement this using either separate SCHEMA_NAME parameter, or
> >> similar to what you suggested in option 3 but with schema name instead
> of
> >> cache name.
> >>
> >> Please feel free to create a ticket.
> >>
> >> -Val
> >>
> >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <stu...@stuwee.org>
> wrote:
> >>
> >> Hello Igniters,
> >>
> >>
> >> The Ignite Spark SQL interface currently takes just “table name” as a
> >>
> >> parameter which it uses to supply a Spark dataset with data from the
> >>
> >> underlying Ignite SQL table with that name.
> >>
> >>
> >> To do this it loops through each cache and finds the first one with the
> >>
> >> given table name [1]. This causes issues if there are multiple tables
> >>
> >> registered in different caches with the same table name as you can only
> >>
> >> access one of those caches from Spark. Is the right thing to do here:
> >>
> >>
> >> 1. Simply not support such a scenario and note in the Spark
> documentation
> >>
> >> that table names must be unique?
> >>
> >> 2. Pass an extra parameter through the Ignite Spark data source which
> >>
> >> optionally specifies the cache name?
> >>
> >> 3. Support namespacing in the existing table name parameter, ie
> >>
> >> “cacheName.tableName”?
> >>
> >>
> >> Thanks,
> >>
> >> Stuart.
> >>
> >>
> >> [1]
> >>
> >>
> >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945
> 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/
> spark/impl/package.scala#L119
>

Re: Spark SQL Table Name Resolution

Reply via email to