[julia-users] Re: Parallel file access

Steven Sagaert Tue, 18 Oct 2016 02:55:18 -0700

Well if you want multiple processes to write into the db you should use one 
that can handle concurrency, i.e. a "real" DB not a simple desktop/embedded 
DB like SQLlite. So for example Postgres or if you do not want to deal with 
SQL then use a NOSQL db e.g. mongodb (there are many more). For a column 
store relational DB (good for analytics): monetDB.
If you still want all the data in one file at the end then write a program 
that at the end exports the data from the db to a file (that program is a 
single process so no concurrency issues).


 You could also do everything in-memory and let it serialize to disk async 
: e.g. Apache ignite (there are a bunch of others).
There's also sciDB for array-oriented DB.

This is just a small sample of possibilities, If you want a pure julia 
solution, then you could do it with the julia multiprocessing functionality 
but you'll have to work with locking to coördinate between the processes 
(i.e. it isn't just the typical trivial "divide and conquer" data 
parallelism anymore).


On Monday, October 17, 2016 at 7:07:28 PM UTC+2, Zachary Roth wrote:
>
> Thanks for the responses.
>
> Raph, thank you again.  I very much appreciate your "humble offering". 
>  I'll take a further look into your gist.
>
> Steven, I'm happy to use the right tool for the job...so long as I have an 
> idea of what it is.  Would you care to offer more insights or suggestions 
> for the ill-informed (such as myself)?
>
> ---Zachary
>
>
>
> On Sunday, October 16, 2016 at 7:51:19 AM UTC-4, Steven Sagaert wrote:
>>
>> that because SQLLite isn't a multi-user DB server but a single user 
>> embedded (desktop) db. Use the right tool for the job.
>>
>> On Saturday, October 15, 2016 at 7:02:58 PM UTC+2, Ralph Smith wrote:
>>>
>>> How are the processes supposed to interact with the database?  Without 
>>> extra synchronization logic, SQLite.jl gives (occasionally)
>>> ERROR: LoadError: On worker 2:
>>> SQLite.SQLiteException("database is locked")
>>> which on the face of it suggests that all workers are using the same 
>>> connection, although I opened the DB separately in each process.
>>> (I think we should get "busy" instead of "locked", but then still have 
>>> no good way to test for this and wait for a wake-up signal.)
>>> So we seem to be at least as badly off as the original post, except with 
>>> DB calls instead of simple writes.
>>>
>>> We shouldn't have to stand up a separate multithreaded DB server just 
>>> for this. Would you be kind enough to give us an example of simple (i.e. 
>>> not client-server) multiprocess DB access in Julia?
>>>
>>> On Saturday, October 15, 2016 at 9:40:17 AM UTC-4, Steven Sagaert wrote:
>>>>
>>>> It still surprises me how in the scientific computing field people 
>>>> still refuse to learn about databases and then replicate database 
>>>> functionality in files in a complicated and probably buggy way. HDF5  is 
>>>> one example, there are many others. If you want to to fancy search (i.e. 
>>>> speedup search via indices) or do things like parallel writes/concurrency 
>>>> you REALLY should use databases. That's what they were invented for 
>>>> decades 
>>>> ago. Nowadays there a bigger choice than ever: Relational or 
>>>> non-relational 
>>>> (NOSQL), single host or distributed, web interface or not,  disk-based or 
>>>> in-memory,... There really is no excuse anymore not to use a database if 
>>>> you want to go beyond just reading in a bunch of data in one go in memory.
>>>>
>>>> On Monday, October 10, 2016 at 5:09:39 PM UTC+2, Zachary Roth wrote:
>>>>>
>>>>> Hi, everyone,
>>>>>
>>>>> I'm trying to save to a single file from multiple worker processes, 
>>>>> but don't know of a nice way to coordinate this.  When I don't 
>>>>> coordinate, 
>>>>> saving works fine much of the time.  But I sometimes get errors with 
>>>>> reading/writing of files, which I'm assuming is happening because 
>>>>> multiple 
>>>>> processes are trying to use the same file simultaneously.
>>>>>
>>>>> I tried to coordinate this with a queue/channel of `Condition`s 
>>>>> managed by a task running in process 1, but this isn't working for me. 
>>>>>  I've tried to simiplify this to track down the problem.  At least part 
>>>>> of 
>>>>> the issue seems to be writing to the channel from process 2.  
>>>>> Specifically, 
>>>>> when I `put!` something onto a channel (or `push!` onto an array) from 
>>>>> process 2, the channel/array is still empty back on process 1.  I feel 
>>>>> like 
>>>>> I'm missing something simple.  Is there an easier way to go about 
>>>>> coordinating multiple processes that are trying to access the same file? 
>>>>>  If not, does anyone have any tips?
>>>>>
>>>>> Thanks for any help you can offer.
>>>>>
>>>>> Cheers,
>>>>> ---Zachary
>>>>>
>>>>

[julia-users] Re: Parallel file access

Reply via email to