On Fri, Jan 18, 2013 at 09:33:05AM +0800, David Baxendale (GMail - Singapore) 
wrote:
> I don't think Fossil is the right tool for this, take a look at Calibre   
> (http://calibre-ebook.com/)  as an Open Source document management  
> system, not just an e-book reader.

Calibre can't handle several 100 k documents, and it can't
do full text search on the document body -- at least the
versions I used couldn't.

> Calibre manages your e-book/book/PDF collection and can sort the books  
> in your library by: Title, Author, Date added, Date published, Size,  
> Rating, Series, etc. In addition, it supports extra searchable metadata:
>
>  * Tags: A flexible system for categorizing your collection however you
>    like
>  * Comments: A long form entry that you can use for book description,
>    notes, reviews, etc
>  * User fields, so you can have a revision code, or you could include
>    the revision code in the title (probably better), for example

Only an option for small, hand-curated document stores.
Imagine having to deal with 100s of millions or billions
of documents. You can only process such volumes automatically.

> You can easily search your collection for a particular book. Calibre  
> supports searching any and all of the fields mentioned above. You can  
> construct advanced search queries by clicking the helpful "Advanced  
> search" button to the left of the search bar.
>
> You can export arbitrary subsets of your collection to your hard disk  
> arranged in a fully customizable folder structure.
>
> For group access Calibre has a built-in web server that allows you to  
> access your collection using a simple browser from any computer anywhere  
> in the world. It can also email your books and downloaded news to you  
> automatically. It has support for mobile devices, so you can browse your  
> collection and download books from your smartphone, Kindle, etc.
>
> One point to note is that systems files the documents by Author/Title on  
> the hard disk, this is fixed and you cannot change this. However, this  
> is not as inflexible as it sounds, because the Author could be a Client,  
> Journal, or whatever you wish.

A good way to organize documents save of using a real database 
is to name them by cryptographic content of their hash, and
to store them into directories named by the first octet (subdirectories
by the second octet, more for extremely large assemblies).

You would still use a real database to find the documents.

> I use Calibre for my technical library with over 8000 technical papers  

Library Genesis (both content and source code freely available) 
currently has 0.85+ Mvolumes, and will be probably at several Mvolumes
before very long.

It would be a good idea if somebody would extend the libgen codebase
to full text index search of the document body.

> and have found it an indispensable tool for managing and finding  
> information.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to