Re: Spark Data Frame support in Ignite

Dmitriy Setrakyan Thu, 03 Aug 2017 00:12:58 -0700

On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke <jornfra...@gmail.com> wrote:


> I think the development effort would still be higher. Everything would
> have to be put via JDBC into Ignite, then checkpointing would have to be
> done via JDBC (again additional development effort), a lot of conversion
> from spark internal format to JDBC and back to ignite internal format.
> Pagination I do not see as a useful feature for managing large data volumes
> from databases - on the contrary it is very inefficient (and one would to
> have to implement logic to fetch al pages). Pagination was also never
> thought of for fetching large data volumes, but for web pages showing a
> small result set over several pages, where the user can click manually for
> the next page (what they anyway not do most of the time).
>
> While it might be a quick solution , I think a deeper integration than
> JDBC would be more beneficial.
>

Jorn, I completely agree. However, we have not been able to find a
contributor for this feature. You sound like you have sufficient domain
expertise in Spark and Ignite. Would you be willing to help out?


> > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
> >
> >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke <jornfra...@gmail.com>
> wrote:
> >>
> >> I think the JDBC one is more inefficient, slower requires too much
> >> development effort. You can also check the integration of Alluxio with
> >> Spark.
> >>
> >
> > As far as I know, Alluxio is a file system, so it cannot use JDBC.
> Ignite,
> > on the other hand, is an SQL system and works well with JDBC. As far as
> the
> > development effort, we are dealing with SQL, so I am not sure why JDBC
> > would be harder.
> >
> > Generally speaking, until Ignite provides native data frame integration,
> > having JDBC-based integration out of the box is minimally acceptable.
> >
> >
> >> Then, in general I think JDBC has never designed for large data volumes.
> >> It is for executing queries and getting a small or aggregated result set
> >> back. Alternatively for inserting / updating single rows.
> >>
> >
> > Agree in general. However, Ignite JDBC is designed to work with larger
> data
> > volumes and supports data pagination automatically.
> >
> >
> >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan <dsetrak...@apache.org>
> >> wrote:
> >>>
> >>> Jorn, thanks for your feedback!
> >>>
> >>> Can you explain how the direct support would be different from the JDBC
> >>> support?
> >>>
> >>> Thanks,
> >>> D.
> >>>
> >>>> On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke <jornfra...@gmail.com>
> >> wrote:
> >>>>
> >>>> These are two different things. Spark applications themselves do not
> use
> >>>> JDBC - it is more for non-spark applications to access Spark
> DataFrames.
> >>>>
> >>>> A direct support by Ignite would make more sense. Although you have in
> >>>> theory IGFS, if the user is using HDFS, which might not be the case.
> It
> >> is
> >>>> now also very common to use Object stores, such as S3.
> >>>> Direct support could be leverage for interactive analysis or different
> >>>> Spark applications sharing data.
> >>>>
> >>>>> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan <dsetrak...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>> Igniters,
> >>>>>
> >>>>> We have had the integration with Spark Data Frames on our roadmap
> for a
> >>>>> while:
> >>>>> https://issues.apache.org/jira/browse/IGNITE-3084
> >>>>>
> >>>>> However, while browsing Spark documentation, I cam across the generic
> >>>> JDBC
> >>>>> data frame support in Spark:
> >>>>> https://spark.apache.org/docs/latest/sql-programming-guide.
> >>>> html#jdbc-to-other-databases
> >>>>>
> >>>>> Given that Ignite has a JDBC driver, does it mean that it
> transitively
> >>>> also
> >>>>> supports Spark data frames? If yes, we should document it.
> >>>>>
> >>>>> D.
> >>>>
> >>
>

Re: Spark Data Frame support in Ignite

Reply via email to