hi Tobias
On 01/18/2012 10:55 PM, Tobias Wunden wrote:
Hi David,
On our recently upgraded system (1.3.x) I'm seing repeated failures to distribute, the
workflow just stalls on the "Distributing media to progressive downloads"
where is matterhorn trying to copy the files to? Is it a network share or a
local disk? Does ist start copying the files, and if so, do you see the file
size increase?
The share volume is a single NFS volume mounted on the admin node so
hard linking is enabled
I've so far tried:
1) Not having the workers talk to the admin via mod_proxy
2) Doubled the memory on the admin.
I am wondering why you are thinking that it could be a memory issue? The
distribution service first copies the files to the local workspace and then
copy the files to the final destination using either file channels (uses the
native os for copying) or in chunks of 1 MB.
I had seen an out of memory error on the admin node, and since have been
able to reproduce this. THe first run with yourkit also seems to
indicate something is not right memory usage wise. Will post more when I
have an analysis
Before I did 1. I did seem messages that "distributing file x timed out. retrying in
yms" - but never saw any evidence of a retry. The last job after I made change 1.
just failed after putting all 3 nodes in the cluster under heavy load.
That is *really* strange. I tried to find that log message (or similar ones) in
the codebase without any luck. Do you think you could possibly get the correct
message from the logs?
2012-01-18 21:01:44 INFO (TrustedHttpClientImpl:338) - Sleeping
557459ms before trying request
http://media.uct.ac.za:8080/distribution/streaming/dispatch again due to
a HTTP/1.1\
401 Nonce has expired/timed out
On the admin node
The media packages in question are in the region of 1.2Gb and contain just over
an hour of recordings. Smaller media packages have no issues, this leads me to
suspect that there is a regression in this service that is doing something like
reading the files into memory.
A timeout seems likely. What could be possible is that we are hitting a bug
with downloading large files from the working file repository. Do you have the
working file repository root configured to the same share as the workspace root
(i. e. is hard linking enabled)?
Yes
Thanks,
Tobias
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
[email protected]
_______________________________________________
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
[email protected]
_______________________________________________