[jira] [Commented] (HIVE-21676) use a system table as an alternative proc store

Sergey Shelukhin (JIRA) Thu, 02 May 2019 10:40:58 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-21676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831806#comment-16831806
 ]


Sergey Shelukhin commented on HIVE-21676:
-----------------------------------------

Lol, no, it's supposed to be an HBase ticket

> use a system table as an alternative proc store
> -----------------------------------------------
>
>                 Key: HIVE-21676
>                 URL: https://issues.apache.org/jira/browse/HIVE-21676
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> We keep hitting these issues:
> {noformat}
> 2019-04-30 23:41:52,164 INFO  [master/master:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Starting 16 core workers (bigger of cpus/4 or 
> 16) with max (burst) worker count=160
> 2019-04-30 23:41:52,171 INFO  [master/master:17000:becomeActiveMaster] 
> util.FSHDFSUtils: Recover lease on dfs file 
> .../MasterProcWALs/pv2-00000000000000000481.log
> 2019-04-30 23:41:52,176 INFO  [master/master:17000:becomeActiveMaster] 
> util.FSHDFSUtils: Recovered lease, attempt=0 on 
> file=.../MasterProcWALs/pv2-00000000000000000481.log after 5ms
> 2019-04-30 23:41:52,288 INFO  [master/master:17000:becomeActiveMaster] 
> util.FSHDFSUtils: Recover lease on dfs file 
> .../MasterProcWALs/pv2-00000000000000000482.log
> 2019-04-30 23:41:52,289 INFO  [master/master:17000:becomeActiveMaster] 
> util.FSHDFSUtils: Recovered lease, attempt=0 on 
> file=.../MasterProcWALs/pv2-00000000000000000482.log after 1ms
> 2019-04-30 23:41:52,373 INFO  [master/master:17000:becomeActiveMaster] 
> wal.WALProcedureStore: Rolled new Procedure Store WAL, id=483
> 2019-04-30 23:41:52,375 INFO  [master/master:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Recovered WALProcedureStore lease in 206msec
> 2019-04-30 23:41:52,782 INFO  [master/master:17000:becomeActiveMaster] 
> wal.ProcedureWALFormatReader: Read 1556 entries in 
> .../MasterProcWALs/pv2-00000000000000000482.log
> 2019-04-30 23:41:55,370 INFO  [master/master:17000:becomeActiveMaster] 
> wal.ProcedureWALFormatReader: Read 28113 entries in 
> .../MasterProcWALs/pv2-00000000000000000481.log
> 2019-04-30 23:41:55,384 ERROR [master/master:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 166, max stack id is 181, root 
> procedure is Procedure(pid=289380, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> 2019-04-30 23:41:55,384 ERROR [master/master:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 178, max stack id is 181, root 
> procedure is Procedure(pid=289380, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> 2019-04-30 23:41:55,389 ERROR [master/master:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 359, max stack id is 360, root 
> procedure is Procedure(pid=285640, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> {noformat}
> After which the procedure(s) is/are lost and cluster is stuck permanently.
> There were no errors writing these files in the log, and no issues reading 
> them from HDFS, so it's purely a data loss issue in the structure. 
> I was thinking about debugging it, but on 2nd thought what we are trying to 
> store is some PB blob, by key.
> Coincidentally, we have an "HBase" facility that we already deploy, that does 
> just that... and it even has a WAL implementation. I don't know why we cannot 
> use it for procedure state and have to invent another complex implementation 
> of a KV store inside a KV store.
> In all/most cases, we don't even support rollback and use the latest state, 
> but if we need multiple versions, this HBase product even supports that! 
> I think we should add a hbase:proc table that would be maintained similar to 
> meta. The latter part esp. given the existing code for meta should be much 
> more simple than a separate store impl.
> This should be pluggable and optional via ProcStore interface (made more 
> abstract as relevant - update state, scan state, get)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21676) use a system table as an alternative proc store

Reply via email to