Ok, I’ve filed 2 issues
https://issues.apache.org/jira/browse/ASTERIXDB-1540
https://issues.apache.org/jira/browse/ASTERIXDB-1541
and I’ve assigned the second one (update dependencies) to Ian as I
think that he is familiar with the field and probably the only one that
knows about the YARN part :)
Cheers,
Till
On 21 Jul 2016, at 13:45, Mike Carey wrote:
IMO: Yes to all... :-)
On 7/21/16 12:57 PM, Till Westmann wrote:
Ok, so would it make sense (and work) to update all of out
dependencies to that lastest 2.6 release?
Longer term - if we want to continue to support HDFS - it seems that
we should think about being able to support different versions of
HDFS with the same AsterixDB instance. That way we could use and
combine data from different clusters with the data in AsterixDB.
Does that make sense?
Would that be desirable and feasible?
Cheers,
Till
On 21 Jul 2016, at 11:10, Mike Carey wrote:
My 0.15 cents' worth:
1 is of definite interest as a way of sneakily expanding our turf -
AsterixDB is in the "NoSQL on steroids" space, in terms of our
features and functionality - but can properly encroach on the "SQL
on Hadoop" analytics world with 1. That's something that's of
interest, I think. For now I think supporting one popular version
of Hadoop is good - so 2.x.x is a fine answer for that.
2 was an NSF deliverable and we felt it would be helpful w.r.t. the
world of 1 - i.e., maybe folks would be more comfortable running us
in their data centers if their YARN sysadmins could be the
resource/etc managers. I think that's also still of interest, and
both 1 and 2 are things we should maintain.
3 is for an interesting/fun research question - namely, would
AsterixDB on HDFS storage be better from a replication, etc.,
standpoint than AsterixDB doing everything natively and using
DB-style replication. The goal of 3 is to explore that question but
not to make HDFS-ified AsterixDB a released/supported feature in
AsterixDB in any particular timeframe. At the time we started
looking at 3, we were also thinking it might (albeit misguidedly
:-)) make potential "enterprise adopters" of AsterixDB happier to
"know that their data is safely kept in HDFS". (Nevermind that we
could corrupt the details of their data and make it unusable still.
:-)) I think that's no longer something we need to worry about as a
reason for 3 - the real reason for 3 is experimental systems
research (i.e., the native vs. HDFS performance issues study).
Cheers,
Mike
On 7/21/16 1:49 AM, abdullah alamoudi wrote:
I think that list is all we've got. We only support Hadoop 2.x.x.
We found that supporting both 1.x and 2.x has a cost that we
couldn't
afford. I believe there are fundamental differences between Hadoop
1.x and
2.x and that a good segment of Hadoop community still use 1.x.
However, it
has been a while since 1.x got a new release and so, I am not sure
if it is
worth investing time in making it work.
Also, seems to me that our Hadoop support is mainly for attracting
existing
users of Hadoop and so, I really think we should not invest in that
area
anymore. The only thing that I think we should continue doing is
maybe add
more tests (for different formats,etc). That is just my opinion :)
What happened to Hadoop Compatibility Layer? Is that still a thing?
On Thu, Jul 21, 2016 at 5:24 AM, Ian Maxon <[email protected]> wrote:
That's all the ways we use Hadoop at the moment that I can think
of as
well. Maybe the two other minor ones are zookeeper and HDFS backup
in
Managix.
For 1) and 2) it's using Hadoop 2.2.0 right now. In my
experimental branch
for 3) I'm using 2.6.0, it doesn't cause any more issues for me
than 2.2.0.
I believe 1) used to support Hadoop 0.20.0 and other 1.x versions
but I'm
not sure if that works anymore.
On Wed, Jul 20, 2016 at 7:14 PM, Till Westmann <[email protected]>
wrote:
Hi everybody,
recently the topic of Hadoop support came up and I realized that
my
understanding is quite spotty so I’m trying to understand where
we are.
AFAIK we support
1) HDFS for (potentially indexed) external datasets,
2) YARN as a resource manager, and
3) HDFS as a basis for internal storage.
Is this list complete or do we have other Hadoop touchpoints?
I believe that 1) and 2) should be reasonable stable and that 3)
is still
in
the works. Is that correct?
Further I'm wondering
a) which versions of Hadoop we support and
b) which ones we should support for all the cases.
Please chime in on this as well.
Any other things that anybody working with AsterixDB and Hadoop
should be
aware
of?
Thanks!
Till