[ 
https://issues.apache.org/jira/browse/HIVE-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663027#action_12663027
 ] 

Joydeep Sen Sarma commented on HIVE-176:
----------------------------------------

- inferNumReducers(): instead of two calls to the hivehistory - can just make 
one call at the end of the function when the numReducers has been set for sure. 
We could also set NUM_REDUCERS to 0 when no reducer is specified (more 
informative imho).
- I still don't see why HAS_REDUCE_TASKS and NUM_REDUCE_TASKS are meaningful 
counters. what is the use case?
- In TestHiveHistory - please use setup() method or constructor to do 
initialization. also a negative test case would be good (to check if negative 
job status is being captured for example).
- HiveHistoryViewer - indentation is badly off. I think we are following a 
general convention of '} else {' as well (and curly braces on same like as 
function/class declaration - viz 'void init() {'.
- JOB_STATUS and TASK_STATUS are both unused.
- i couldn't understand this code block in parseHiveHistory: 
+       if (!line.trim().endsWith("\"")){
+         continue; 
+       }
   can u explain.
- parseLine: confused that we have a reg ex group for the key - but are not 
using it .. seems weird - if u had groups for both key and value u wouldn't 
need to split. alternately u can rely on just the split.
- getHiveHistory - i don't think it's a good idea to initialize hivehistory 
object on demand:
  a) u always need it
  b) it prints stuff to the console (log file location). if u want a 
deterministic location for this log - we should just initialize hivehistory at 
session initialization so that the log file location always comes at the 
beginning of the session (and not at some random point when the code actually 
requires it)

- it would be good to have an example of the hive history file/format checked 
in somewhere with a pointer to it from the documentation (either in README or 
wiki). 
- another easy and comprehensive test to add is in TestCliDriver. This is 
generated code that fires a bunch of queries - we should be easily able to use 
HiveHistoryViewer to assert that query status is successful for all queries in 
positive tests.

One thing i am concerned about overall is the use of the term 'job' for what is 
essentially a hive query. I think this creates a lot of room for confusion - 
since in the hadoop ecosystem job means hadoop job. (we have also overloaded 
the word task in Hive - which is unfortunate - but almost too late now). If 
possible - i would really appreciate if we could replace 'job' with 'query' 
whereever applicable. (s/startJob/startQuery/ for example).

> structured log for obtaining query stats/info
> ---------------------------------------------
>
>                 Key: HIVE-176
>                 URL: https://issues.apache.org/jira/browse/HIVE-176
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Logging
>    Affects Versions: 0.2.0
>            Reporter: Joydeep Sen Sarma
>            Assignee: Suresh Antony
>             Fix For: 0.2.0
>
>         Attachments: patch_176.txt, patch_176.txt, patch_176.txt
>
>
> Josh <[email protected]> wrote:
> When launching off hive queries using hive -e is there a way to get the job 
> id so that I can just queue them up and go check their statuses later? What's 
> the general pattern for queueing and monitoring without using the libraries 
> directly?
> I'm gonna throw my vote in for a structured log format. Users could tail it 
> and use whatever queuing or monitoring they wish. It's also probably just a 
> 30 minute project for someone already familiar with the code. I suggest ^A 
> seperated key=value pairs per log line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to