There were several issues, but the one I finally got stuck on (well, thought I'd better ask before mucking things up too badly), was "file.path".
According to the docs, this method is supposed to return the local path for the file (suitable for use by the standard python "open"), or throw an exception for objects that don't have local paths (like S3). Obviously S3 throws the exception. This isn't caught, so things blow up in apps/documents/models.py. For example DocumentVersion.exists calls self.file.path, and crashes there. I hacked around that to see if that was the only problem, but ran into trouble elsewhere for the same reason. Decided I didn't understand the code well enough to fix this - at least not yet. Cheers, Ami. On Mon, Jul 23, 2012 at 9:54 PM, Roberto Rosario < [email protected]> wrote: > That is why I avoid external binary dependencies like the plague :) > Otherwise I could just pass the file handle returned by the storage class > to whatever Python code needs to process a document. But whenever there is > processing by an external utility Mayan already copies the document file > locally and treats it like a cached version of the original document: > > https://github.com/rosarior/mayan/blob/master/apps/documents/models.py#L120 > > which in turn calls the Document's latest version's open method: > > https://github.com/rosarior/mayan/blob/master/apps/documents/models.py#L487 > > which in turn calls the storage class open method :) > > https://github.com/rosarior/mayan/blob/master/apps/documents/models.py#L493 > > Never is a document assumed to be local. I tested this decoupling in the > past storing documents in a GridFS clustered storage and worked, but it was > quite some time ago so I'm very eager to see what Ami found out and fix it. > > > /Roberto > > On Monday, July 23, 2012 4:22:42 PM UTC-4, Nate Aune wrote: >> >> I think one problem is that much of the Mayan functionality (OCR, >> metadata, etc.) expects that the files are on a locally accessible file >> system, which is not the case if the files are on S3. >> >> So it would seem necessary to temporarily download the files from S3 to a >> /tmp folder to process them with the Unix cmd line tools. >> >> I don't have any direct experience with doing this with Mayan, but I know >> from another project in which we were storing MP3 files in a database, that >> if we wanted to extract the ID3 tags from the files using id3 cmd line >> tools, we had to copy the MP3 files out of the database and put them in a >> temp folder. >> >> Nate >> >> >> On Mon, Jul 23, 2012 at 3:42 PM, Carlos Aguilar <[email protected]>wrote: >> >>> >>> You can check django-storages to make compatible with S3 your app. >>> >>> -- >>> Carlos Aguilar >>> Consultor Hardware y Software >>> DWD&Solutions >>> http://www.dwdandsolutions.com >>> http://www.houseofsysadmin.com >>> Cel: 78740173 >>> Oficina: 22693598 >>> >>> -- >>> >>> >>> >>> >> >> >> >> -- >> >> >> [email protected] >> +1 (617) 517-4953 >> http://twitter.com/natea | http://linkedin.com/in/natea >> >> -- > > > > --
