I'm working a bit on the scanningcabinet port and have two data model 
questions that might be related, and might be generic enough for camlistore 
in general to be a single "dataSource" predicate:

1. How do I store a text blob created from an image

OCR is expensive (maybe lots of CPU cycles locally, maybe I actually pay to 
pass it through a service) so I don't want the text stored in an index 
where each server must extract it. A full-text index could search through 
these blobs, but I want to only create the blob once across my network of 
camlistores. It's also lossy, so the canonical source is still the image, 
and that's where the permanode should be.

2. How do I store an image extracted from a PDF

In this case, I'm pulling a PDF apart by generating an image for each page. 
The images have permanodes and tags, so they are their own object. They are 
lossy though, so I might need to refer to the parent PDF to read something 
properly.


Is there a concept of "camlistore:dataSource" or something that I should be 
using?

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to