Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

Donald Miner Thu, 19 Jun 2014 11:31:20 -0700

I had to think about this problem a lot for a product I worked on at one
point, but I think a lot of the same applies here.

To Corey's point, running the rebalancer is most definitely an issue, but
simply turning it off is not a good answer in a lot of situations. It
exists for a reason! You can run into problems with highly utilized
clusters where individual data nodes run out of disk space and all kinds of
bad things start to happen then. Also, if you are also using the cluster
for MapReduce, you can see performance gains by rebalancing on highly
utilized clusters.

In general, the placement of blocks is the NameNode's responsibility, so
even if it's nice to assume that blocks get written to the local data node,
that's not really an assumption you can always make.

There has been talk about custom block placement strategies for HDFS in the
NameNode. I just checked up on it and it does look like it is on the
horizon: https://issues.apache.org/jira/browse/HDFS-2576
In theory, you could have Accumulo "hint" that it wants blocks in a certain
place colocated.

There is another interesting problem with the results of minor compactions.
Let's say you've been minor compacting all day and have a dozen or so of
these files written. The replication policy is pretty random. DataNode that
the tablet server has a fatal problem and never comes back. There is no way
to "collect" the replicas together onto one DataNode-- they are scattered
all over data nodes. Eventually a major compaction happens and all is good
again. There were some ideas of telling NameNode that certain blocks have
an affinity for one another to keep them together.

I think this can be solved scientifically pretty easily on a live
production cluster:
 Step 1: measure performance of your current application and note if it
does lots of single fetches, full table scans, etc.
 Step 2: Run the rebalancer
 Step 3: measure performance again
 Step 4: force major compaction to move everything back (optional)
Unfortunately I don't have any systems right now that I could do this on
that would provide me any sort of real results.

Overall, the question on does it matter? It most absolutely matters.
Slurping off disk locally is always going to be faster than slurping off
disk AND going over the network. The real question is if it's worth our
time. 10GigE is a beautiful thing. In some cases it may be, in others it
may not. For example, if you are just doing small fetches of data here and
there you might not notice. I imagine if you were doing multiple large
scans you might start seeing your network get saturated. I think this also
becomes a problem at larger scales where your network infrastructure is a
bit more ridiculous. Let's say for the sake of argument you have a 25,000
node Accumulo cluster... you might have some sort of tiered network where
you are constrained from a throughput perspective somewhere. This would
matter then.

My 8 cents,

-d

On Thu, Jun 19, 2014 at 12:56 PM, Josh Elser <[email protected]> wrote:

> I may also be getting this conflated with how reads work. Time for me to
> read some HDFS code.
>
>
> On 6/19/14, 8:52 AM, Josh Elser wrote:
>
>> I believe this happens via the DfsClient, but you can only expect the
>> first block of a file to actually be on the local datanode (assuming
>> there is one). Everything else is possible to be remote. Assuming you
>> have a proper rack script set up, you would imagine that you'll still
>> get at least one rack-local replica (so you'd have a block nearby).
>>
>> Interestingly (at least to me), I believe HBase does a bit of work in
>> region (tablet) assignments to try to maximize the locality of regions
>> WRT the datanode that is hosting the blocks that make up that file. I
>> need to dig into their code some day though.
>>
>> In general, Accumulo and HBase tend to be relatively comparable to one
>> another with performance when properly configured which makes me apt to
>> think that data locality can help, but it's not some holy grail (of
>> course you won't ever hear me claim anything be in that position). I
>> will say that I haven't done any real quantitative analysis either though.
>>
>> tl;dr HDFS block locality should not be affecting the functionality of
>> Accumulo.
>>
>> On 6/19/14, 7:25 AM, Corey Nolet wrote:
>>
>>> AFAIK, the locality may not be guaranteed right away unless the data
>>> for a
>>> tablet was first ingested on the tablet server that is responsible for
>>> that
>>> tablet, otherwise you'll need to wait for a major compaction to
>>> rewrite the
>>> RFiles locally on the tablet server. I would assume if the tablet
>>> server is
>>> not on the same node as the datanode, those files will probably be spread
>>> across the cluster as if you were ingesting data from outside the cloud.
>>>
>>> A recent discussion with Bill Slacum also brought to light a possible
>>> problem of the HDFS balancer [1] re-balancing blocks after the fact which
>>> could eventually pull blocks onto datanodes that are not local to the
>>> tablets. I believe remedy for this was to turn off the balancer or not
>>> have
>>> it run.
>>>
>>> [1]
>>> http://www.swiss-scalability.com/2013/08/hadoop-hdfs-
>>> balancer-explained.html
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 19, 2014 at 10:07 AM, David Medinets
>>> <[email protected]>
>>> wrote:
>>>
>>>  At the Accumulo Summit and on a recent client site, there have been
>>>> conversations about Data Locality and Accumulo.
>>>>
>>>> I ran an experiment to see that Accumulo can scan tables when the
>>>> tserver process is run on a server without a datanode process. I
>>>> followed these steps:
>>>>
>>>> 1. Start three node cluster
>>>> 2. Load data
>>>> 3. Kill datanode on slave1
>>>> 4. Wait until Hadoop notices dead node.
>>>> 5. Kill tserver on slave2
>>>> 6. Wait until Accumulo notices dead node.
>>>> 7. Run the accumulo shell on master and slave1 to verify entries can be
>>>> scanned.
>>>>
>>>> Accumulo handled this situation just fine. As I expected.
>>>>
>>>> How important (or not) is it to run tserver and datanode on the same
>>>> server?
>>>> Does the Data Locality implied by running them together exist?
>>>> Can the benefit be quantified?
>>>>
>>>>
>>>

-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

Reply via email to