Re: Trio: AsterixDB, Spark and Zeppelin.

Coakley, Kevin Wed, 10 Aug 2016 13:34:19 -0700

Mike,

UCLA wanted a way to do use Spark’s Machine Learning packages with data stored 
in AsterixDB. We started looking at the Spark connector as way to access the 
data in AsterixDB directly instead of having to export the data from AsterixDB 
to a file and import the file in Spark. I don’t know how this is fits into 
Amarnath’s projects, I was just following up on a request from UCLA to see what 
would be involved in providing this Spark connector to others.

The current status is: I have the Spark connector working in a test environment 
with the queries provided by Wail. I was planning on loading a small amount of 
data into the test AsterixDB server with the Schema Inferencer code and running 
my own queries, but I have not had time yet. The issue with providing others 
with access to the Spark connector is the version of AsterixDB that we are 
running that contains the Twitter data does not have the Schema Inferencer code 
and therefor will not work with the Spark connector.  

I don’t believe SDSC would want to update the AsterixDB servers that contain 
the Twitter data with the Schema Inferencer code until after it has been 
approved by you and merged into the master branch. However, even after the 
Schema Inferencer code has been merged into the master branch, we wouldn’t have 
it ready of people to use right away. 

I offered to load a small subset of the data from our main servers into my test 
environment that has a working Spark connector for UCLA to test, but it sounds 
like they misunderstood my offer.

I would be happy to help you test the Schema Inferencer and Spark connector if 
you have specific items that you want me to check, I can also give others that 
you select access to test environment so they can run tests themselves. 
Otherwise, I will respond here if I discover any issues.

My current test environment is Zeppelin with the Spark connector on server A, 
AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster 
running on servers C, D and E.

-Kevin

On 8/10/16, 9:36 AM, "Mike Carey" <[email protected]> wrote:

    Kevin,

    Q:  Could you chime back in here - please 'cc' the user list - with a 
    brief (maybe one paragraph) summary of what you are actually trying to 
    do at the moment and what its current status is?  (And your timeframe, 
    etc.?)

    My impression until yesterday was that you were slowly/leisurely 
    exploring the new Spark connector to AsterixDB that Wail worked on - 
    essentially as his first "beta" user - and that things were moving at 
    the pace you wanted (and were setting).  As an early adopter, I was also 
    under the impression that you were using his branch for your 
    explorations, while he was addressing code review comments, etc.  
    However, when I arrived back home in OC after a trip yesterday, I was 
    the recipient of a message (via a back channel) warning me that there 
    was a blocking issue at SDSC that UCI wasn't being attentive to, one 
    that had AsterixDB on the brink being given up on by the UCLA folks, and 
    that we'd better get on it or....  (Meanwhile I had not heard any such 
    thing from UCLA directly; I was not aware of any blocking Spark issues 
    for SDSC nor of any transitively blocking implications for UCLA, and it 
    still doesn't look from what I see below like there was one.)

    I think that we need to have SDSC's activities be much more visible here 
    - likewise for UCLA's - so that the Apache AsterixDB community has much 
    better visibility into the goals, activities, progress, and problems of 
    our early adopters.  The community wants users to be successful!  It 
    will be much more effective (and healthy and productive) if we all know 
    what's going on and it is clear to all how each of those things are going.

    Thanks!

    Mike

    On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
    > Hi Kevin,
    >
    > Cool!
    > Please let me know if you need any assistance.
    >
    > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <[email protected]> wrote:
    >
    >> Hi Wail,
    >>
    >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
    >> notebook at https://github.com/Nullification/asterixdb-spark-
    >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
    >> ran successfully once I recreated the AsterixDB instance to use the
    >> external IP.
    >>
    >> I have not ran any of my own queries but I did get both of the examples
    >> https://github.com/Nullification/asterixdb-spark-connector to run
    >> successfully.
    >>
    >> Thank you!
    >>
    >> -Kevin
    >>
    >>
    >>
    >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <[email protected]> wrote:
    >>
    >>      One more thing:
    >>      Can you paste your cluster configuration as well?
    >>
    >>      Thanks
    >>
    >>     (ETC ETC ETC deleted)

Re: Trio: AsterixDB, Spark and Zeppelin.

Reply via email to