[jira] Commented: (HIVE-176) structured log for obtaining query stats/info

Suresh Antony (JIRA) Thu, 15 Jan 2009 11:17:32 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664204#action_12664204
 ]


Suresh Antony commented on HIVE-176:
------------------------------------

    *  inferNumReducers(): instead of two calls to the hivehistory - can just 
make one call at the end of the function when the numReducers has been set for 
sure. We could also set NUM_REDUCERS to 0 when no reducer is specified (more 
informative imho).
    ---- made it single call after this function call
    * I still don't see why HAS_REDUCE_TASKS and NUM_REDUCE_TASKS are 
meaningful counters. what is the use case?
    --- Removed both of these variables
    * In TestHiveHistory - please use setup() method or constructor to do 
initialization. also a negative test case would be good (to check if negative 
job status is being captured for example).
    --- moved this code to setUp()
    * HiveHistoryViewer - indentation is badly off. I think we are following a 
general convention of '} else {' as well (and curly braces on same like as 
function/class declaration - viz 'void init() {'.
    --- Re-formtted using eclipse formatter
    * JOB_STATUS and TASK_STATUS are both unused.
    * i couldn't understand this code block in parseHiveHistory:
      + if (!line.trim().endsWith("\"")){ + continue; + }
      can u explain.
    --- Format is key="value"... so the value line does not end with " means 
value has a newline

    * parseLine: confused that we have a reg ex group for the key - but are not 
using it .. seems weird - if u had groups for both key and value u wouldn't 
need to split. alternately u can rely on just the split.
    -- cut and pasted this code From JobHistory Parser
    * getHiveHistory - i don't think it's a good idea to initialize hivehistory 
object on demand:
      a) u always need it
      b) it prints stuff to the console (log file location). if u want a 
deterministic location for this log - we should just initialize hivehistory at 
session initialization so that the log file location always comes at the 
beginning of the session (and not at some random point when the code actually 
requires it)

    -- moved hiveHistory initialization to constructor of sessionSate
    * it would be good to have an example of the hive history file/format 
checked in somewhere with a pointer to it from the documentation (either in 
README or wiki).
    --- Put short summary about the HistoryLog in internal wiki.
           http://www.intern.facebook.com/intern/wiki/index.php/HiveQueryLog
    * another easy and comprehensive test to add is in TestCliDriver. This is 
generated code that fires a bunch of queries - we should be easily able to use 
HiveHistoryViewer to assert that query status is successful for all queries in 
positive tests.
    --- Added hiveHistory Check TestCliDriver. For this to work QTestUtil. 
SessionState is constructed in the constructor of QTestUtil. Not sure this is 
correct way or not
    -- Changed TestCliDriver.vm to check history File.

One thing i am concerned about overall is the use of the term 'job' for what is 
essentially a hive query. I think this creates a lot of room for confusion - 
since in the hadoop ecosystem job means hadoop job. (we have also overloaded 
the word task in Hive - which is unfortunate - but almost too late now). If 
possible - i would really appreciate if we could replace 'job' with 'query' 
whereever applicable. (s/startJob/startQuery/ for example).
     --- Changed all Job referces to Query

    -- should we create the history file always, history will be disabled by 
default and enbaled setting a jobconf parameter. 'enable.job.history'




> structured log for obtaining query stats/info
> ---------------------------------------------
>
>                 Key: HIVE-176
>                 URL: https://issues.apache.org/jira/browse/HIVE-176
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Logging
>    Affects Versions: 0.2.0
>            Reporter: Joydeep Sen Sarma
>            Assignee: Suresh Antony
>             Fix For: 0.2.0
>
>         Attachments: patch_176.txt, patch_176.txt, patch_176.txt
>
>
> Josh <[email protected]> wrote:
> When launching off hive queries using hive -e is there a way to get the job 
> id so that I can just queue them up and go check their statuses later? What's 
> the general pattern for queueing and monitoring without using the libraries 
> directly?
> I'm gonna throw my vote in for a structured log format. Users could tail it 
> and use whatever queuing or monitoring they wish. It's also probably just a 
> 30 minute project for someone already familiar with the code. I suggest ^A 
> seperated key=value pairs per log line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-176) structured log for obtaining query stats/info

Reply via email to