Re: Working with Hadoop

Till Westmann Thu, 21 Jul 2016 16:32:59 -0700

Ok, I’ve filed 2 issues
https://issues.apache.org/jira/browse/ASTERIXDB-1540
https://issues.apache.org/jira/browse/ASTERIXDB-1541

and I’ve assigned the second one (update dependencies) to Ian as Ithink that he is familiar with the field and probably the only one thatknows about the YARN part :)


Cheers,
Till

On 21 Jul 2016, at 13:45, Mike Carey wrote:

IMO:  Yes to all...  :-)


On 7/21/16 12:57 PM, Till Westmann wrote:
Ok, so would it make sense (and work) to update all of outdependencies to that lastest 2.6 release?
Longer term - if we want to continue to support HDFS - it seems thatwe should think about being able to support different versions ofHDFS with the same AsterixDB instance. That way we could use andcombine data from different clusters with the data in AsterixDB.
Does that make sense?
Would that be desirable and feasible?

Cheers,
Till

On 21 Jul 2016, at 11:10, Mike Carey wrote:
My 0.15 cents' worth:
1 is of definite interest as a way of sneakily expanding our turf -AsterixDB is in the "NoSQL on steroids" space, in terms of ourfeatures and functionality - but can properly encroach on the "SQLon Hadoop" analytics world with 1. That's something that's ofinterest, I think. For now I think supporting one popular versionof Hadoop is good - so 2.x.x is a fine answer for that.
2 was an NSF deliverable and we felt it would be helpful w.r.t. theworld of 1 - i.e., maybe folks would be more comfortable running usin their data centers if their YARN sysadmins could be theresource/etc managers. I think that's also still of interest, andboth 1 and 2 are things we should maintain.
3 is for an interesting/fun research question - namely, wouldAsterixDB on HDFS storage be better from a replication, etc.,standpoint than AsterixDB doing everything natively and usingDB-style replication. The goal of 3 is to explore that question butnot to make HDFS-ified AsterixDB a released/supported feature inAsterixDB in any particular timeframe. At the time we startedlooking at 3, we were also thinking it might (albeit misguidedly:-)) make potential "enterprise adopters" of AsterixDB happier to"know that their data is safely kept in HDFS". (Nevermind that wecould corrupt the details of their data and make it unusable still.:-)) I think that's no longer something we need to worry about as areason for 3 - the real reason for 3 is experimental systemsresearch (i.e., the native vs. HDFS performance issues study).
Cheers,

Mike


On 7/21/16 1:49 AM, abdullah alamoudi wrote:
I think that list is all we've got. We only support Hadoop 2.x.x.
We found that supporting both 1.x and 2.x has a cost that wecouldn'tafford. I believe there are fundamental differences between Hadoop1.x and2.x and that a good segment of Hadoop community still use 1.x.However, ithas been a while since 1.x got a new release and so, I am not sureif it is
worth investing time in making it work.
Also, seems to me that our Hadoop support is mainly for attractingexistingusers of Hadoop and so, I really think we should not invest in thatareaanymore. The only thing that I think we should continue doing ismaybe add
more tests (for different formats,etc). That is just my opinion :)

What happened to Hadoop Compatibility Layer? Is that still a thing?

On Thu, Jul 21, 2016 at 5:24 AM, Ian Maxon <[email protected]> wrote:
That's all the ways we use Hadoop at the moment that I can thinkof aswell. Maybe the two other minor ones are zookeeper and HDFS backupin
Managix.
For 1) and 2) it's using Hadoop 2.2.0 right now. In myexperimental branchfor 3) I'm using 2.6.0, it doesn't cause any more issues for methan 2.2.0.I believe 1) used to support Hadoop 0.20.0 and other 1.x versionsbut I'm
not sure if that works anymore.
On Wed, Jul 20, 2016 at 7:14 PM, Till Westmann <[email protected]>wrote:
Hi everybody,
recently the topic of Hadoop support came up and I realized thatmyunderstanding is quite spotty so I’m trying to understand wherewe are.
AFAIK we support
1) HDFS for (potentially indexed) external datasets,
2) YARN as a resource manager, and
3) HDFS as a basis for internal storage.
Is this list complete or do we have other Hadoop touchpoints?
I believe that 1) and 2) should be reasonable stable and that 3)is still
in
the works. Is that correct?

Further I'm wondering
a) which versions of Hadoop we support and
b) which ones we should support for all the cases.
Please chime in on this as well.
Any other things that anybody working with AsterixDB and Hadoopshould be
aware
of?

Thanks!
Till

Re: Working with Hadoop

Reply via email to