On 08/02/2011 06:43 PM, James Taylor wrote:
> On Aug 2, 2011, at 10:12 AM, Andrew Straw wrote:
>> 1) My first specific problem is that loading many datasets (e.g. 250)
>> into history causes the javascript running locally withing a browser to
>> be extremely slow.
> What browser are you using?
Primarily Firefox 3.6.18 as packaged by Ubuntu 10.04 amd64. I know more
recent browsers have faster JS interpreters, but I'm hoping that JS
interpreter speed optimizations will be largely irrelevant with the
proposal for the dataset container proposal I made. (I just tested, and
it's certainly true that Google Chromium 12.0.742.112 on the same system
is much faster.)

>> 2) My second specific problem is that applying a workflow with N steps
>> to many datasets creates even more datasets (Nx250 additional datasets).
>> In addition to the slow Javascript problem, there seems to be other
>> issues I haven't diagnosed further, but the console in which I'm running
>> run.sh indicates many errors of the type "Exception AssertionError:
>> AssertionError('State <sqlalchemy.orm.state.MutableAttrInstanceState
>> object at 0x7f5c18c47990> is not present in this identity map',) in
>> <bound method MutableAttrInstanceState._cleanup of
>> <sqlalchemy.orm.state.MutableAttrInstanceState object at
>> 0x7f5c18c47990>> ignored". Furthermore the webserver gets slow and my
>> nginx frontend proxy gives 504 gateway time-outs.
> Yes, creating all the jobs and datasets for a workflow is relatively slow 
> right now. We have some optimizations for this that are not in the mainline 
> (not well tested) however there is a limit to how fast it can be with so many 
> new datasets and objects being created. 
> The better solution is probably to move workflow creation into a background 
> process. Starting the workflow would just save the initial state, and a 
> background process would actually create all the datasets and jobs and get it 
> running. The downside is that the history would not be completely populated 
> by the time the page had returned. 

There could be a little checkbox in the run workflow page which allows
the user to decide whether to fork the workflow creation process. Then
the user could do so for a big job but keep synchronous behavior for
which just a few jobs are scheduled.

Andrew D. Straw, Ph.D.
Research Institute of Molecular Pathology (IMP)
Vienna, Austria

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to