Tim Armstrong created IMPALA-7214:
-------------------------------------

             Summary: Lots of misleading/incorrect use of DataNode in Impala 
docs
                 Key: IMPALA-7214
                 URL: https://issues.apache.org/jira/browse/IMPALA-7214
             Project: IMPALA
          Issue Type: Documentation
          Components: Docs
    Affects Versions: Impala 2.12.0
            Reporter: Tim Armstrong
            Assignee: Alex Rodoni


The docs tend to conflate DataNodes (a HDFS service) and Impala daemons. I 
think this stems from the original deployment practice of always colocating 
Impala daemons with HDFS datanodes so that HDFS data could always be read from 
a local DataNode. 

I'm a bit pedantic so the conflation feels wrong to me regardless, but I think 
this will become increasingly confusing as alternative deployments without 
colocated HDFS DataNodes become more common (e.g. running against S3, running 
with a separate HDFS service).

E.g. picking an example at random:
{noformat}
        In Impala 1.4.0 and higher, the <codeph>LIMIT</codeph> clause is now 
optional (rather than required) for
        queries that use the <codeph>ORDER BY</codeph> clause. Impala 
automatically uses a temporary disk work area
        to perform the sort if the sort operation would otherwise exceed the 
Impala memory limit for a particular
        DataNode.
{noformat}

This is wrong because the memory limit is for an Impala daemon, which is the 
process that does the actual sorting. So here I think it should be "Impala 
daemon" instead of "DataNode".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to