On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> I think the development effort would still be higher. Everything would > have to be put via JDBC into Ignite, then checkpointing would have to be > done via JDBC (again additional development effort), a lot of conversion > from spark internal format to JDBC and back to ignite internal format. > Pagination I do not see as a useful feature for managing large data volumes > from databases - on the contrary it is very inefficient (and one would to > have to implement logic to fetch al pages). Pagination was also never > thought of for fetching large data volumes, but for web pages showing a > small result set over several pages, where the user can click manually for > the next page (what they anyway not do most of the time). > > While it might be a quick solution , I think a deeper integration than > JDBC would be more beneficial. > Jorn, I completely agree. However, we have not been able to find a contributor for this feature. You sound like you have sufficient domain expertise in Spark and Ignite. Would you be willing to help out? > > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke <jornfra...@gmail.com> > wrote: > >> > >> I think the JDBC one is more inefficient, slower requires too much > >> development effort. You can also check the integration of Alluxio with > >> Spark. > >> > > > > As far as I know, Alluxio is a file system, so it cannot use JDBC. > Ignite, > > on the other hand, is an SQL system and works well with JDBC. As far as > the > > development effort, we are dealing with SQL, so I am not sure why JDBC > > would be harder. > > > > Generally speaking, until Ignite provides native data frame integration, > > having JDBC-based integration out of the box is minimally acceptable. > > > > > >> Then, in general I think JDBC has never designed for large data volumes. > >> It is for executing queries and getting a small or aggregated result set > >> back. Alternatively for inserting / updating single rows. > >> > > > > Agree in general. However, Ignite JDBC is designed to work with larger > data > > volumes and supports data pagination automatically. > > > > > >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan <dsetrak...@apache.org> > >> wrote: > >>> > >>> Jorn, thanks for your feedback! > >>> > >>> Can you explain how the direct support would be different from the JDBC > >>> support? > >>> > >>> Thanks, > >>> D. > >>> > >>>> On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke <jornfra...@gmail.com> > >> wrote: > >>>> > >>>> These are two different things. Spark applications themselves do not > use > >>>> JDBC - it is more for non-spark applications to access Spark > DataFrames. > >>>> > >>>> A direct support by Ignite would make more sense. Although you have in > >>>> theory IGFS, if the user is using HDFS, which might not be the case. > It > >> is > >>>> now also very common to use Object stores, such as S3. > >>>> Direct support could be leverage for interactive analysis or different > >>>> Spark applications sharing data. > >>>> > >>>>> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan <dsetrak...@apache.org> > >>>> wrote: > >>>>> > >>>>> Igniters, > >>>>> > >>>>> We have had the integration with Spark Data Frames on our roadmap > for a > >>>>> while: > >>>>> https://issues.apache.org/jira/browse/IGNITE-3084 > >>>>> > >>>>> However, while browsing Spark documentation, I cam across the generic > >>>> JDBC > >>>>> data frame support in Spark: > >>>>> https://spark.apache.org/docs/latest/sql-programming-guide. > >>>> html#jdbc-to-other-databases > >>>>> > >>>>> Given that Ignite has a JDBC driver, does it mean that it > transitively > >>>> also > >>>>> supports Spark data frames? If yes, we should document it. > >>>>> > >>>>> D. > >>>> > >> >