[
https://issues.apache.org/jira/browse/FALCON-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359179#comment-14359179
]
Sowmya Ramesh commented on FALCON-1068:
---------------------------------------
Ajay: I don't think any system will let application hold the lock indefinitely
as it will result in deadlocks. That said, the logs attached by [~kawaa]
didn't help in figuring out reason for the deadlock . Even earlier I mentioned
that its good to find the root cause and fix it.
Titan has below config to avoid applications from holding lock forever. Default
value is 0.3s after which a lock is considered to have expired. If say retry
was attempted after 0.3s then next transaction would have gone through. This is
not a clean solution but I just wanted to know if retry was attempted as root
cause wasn't clear.
{code}
storage.lock-expiry-time - default value 300,000 ms
{code}
Also, retry logic should be added in Falcon for graph DB operations. Link
https://github.com/thinkaurelius/titan/issues/271 mentions. Handling
transaction failures is not going to solve the problem all the times.
{quote}
When defining unique Titan types with locking enabled (i.e. requesting that
Titan ensures uniqueness) it is likely to encounter locking exceptions under
concurrent modifications to the graph.
com.thinkaurelius.titan.diskstorage.locking.PermanentLockingException: Updated
state: lock acquired but value has changed since read
Such exceptions are to be expected, since Titan cannot know how to recover from
a transactional state where an earlier read value has been modified by another
transaction since this may invalidate the state of the transaction. It most
cases it is sufficient to simply re-run the transaction. If locking exceptions
are very frequent, try to analyze and remove the source of congestion.
{quote}
> When scheduling a process, Falcon throws "Bad Request;Could not commit
> transaction due to exception during persistence"
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: FALCON-1068
> URL: https://issues.apache.org/jira/browse/FALCON-1068
> Project: Falcon
> Issue Type: Bug
> Reporter: Adam Kawa
> Attachments: falcon.application.log.FALCON-1068.rtf
>
>
> I have a simple script "manage-entity.sh process dss" that deletes, submit
> and schedules a Falcon process.
> A couple of times per week, I get the "FalconCLIException: Bad Request;Could
> not commit transaction due to exception during persistence" when submitting
> the process.
> The workaround is to restart Falcon server...
> e.g.:
> {code}
> $ ./manage-entity.sh process dss my-process.xml
> falcon/default/my-process(process) removed successfully (KILLED in ENGINE)
> Stacktrace:
> org.apache.falcon.client.FalconCLIException: Bad Request;Could not commit
> transaction due to exception during persistence
> at
> org.apache.falcon.client.FalconCLIException.fromReponse(FalconCLIException.java:44)
> at
> org.apache.falcon.client.FalconClient.checkIfSuccessful(FalconClient.java:1162)
> at
> org.apache.falcon.client.FalconClient.sendEntityRequestWithObject(FalconClient.java:684)
> at
> org.apache.falcon.client.FalconClient.submitAndSchedule(FalconClient.java:347)
> at org.apache.falcon.cli.FalconCLI.entityCommand(FalconCLI.java:371)
> at org.apache.falcon.cli.FalconCLI.run(FalconCLI.java:182)
> at org.apache.falcon.cli.FalconCLI.main(FalconCLI.java:132)
> $ ./falcon-restart.sh
> Hadoop is installed, adding hadoop classpath to falcon classpath
> Hadoop is installed, adding hadoop classpath to falcon classpath
> falcon started using hadoop version: Hadoop 2.5.0
> $ ./manage-entity.sh process dss my-process.xml
> falcon/default/my-process(process) removed successfully (KILLED in ENGINE)
> schedule/default/my-process(process) scheduled successfully
> submit/falcon/default/Submit successful (process) my-process
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)