I'm working a bit on the scanningcabinet port and have two data model questions that might be related, and might be generic enough for camlistore in general to be a single "dataSource" predicate:
1. How do I store a text blob created from an image OCR is expensive (maybe lots of CPU cycles locally, maybe I actually pay to pass it through a service) so I don't want the text stored in an index where each server must extract it. A full-text index could search through these blobs, but I want to only create the blob once across my network of camlistores. It's also lossy, so the canonical source is still the image, and that's where the permanode should be. 2. How do I store an image extracted from a PDF In this case, I'm pulling a PDF apart by generating an image for each page. The images have permanodes and tags, so they are their own object. They are lossy though, so I might need to refer to the parent PDF to read something properly. Is there a concept of "camlistore:dataSource" or something that I should be using? -- You received this message because you are subscribed to the Google Groups "Camlistore" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
