What I want to do is just get all content of the document and save it
with some other form values such as name, surname, birthdate etc.
We can do that for docx format because their format is zip as you
said.
import zipfile, re
docx = zipfile.ZipFile('/path/to/file/mydocument.docx')
content = docx.read('word/document.xml')
cleaned = re.sub('<(.|\n)*?>','',content)
print cleaned
works with docx files but as you know it is not possible to get all
word files with docx format. Using google docs api seems only solution
for now. However, I am suspicious about Brandon's saying. It might
take too long to finish converting and getting content.
On 17 Şubat, 00:01, Ernesto Karim Oltra <[email protected]>
wrote:
> Well, maybe it's not a viable solution, but Word document is a ZIP
> file with a lot of plain text files inside, so if you know the format,
> you can store the words the document contains.
>
> The option I will implement would be sending the document, storing it
> in datastore, enqueue a task to re-send it to my account of google
> docs.+ Then you can use GData as before suggested to query words
> through your documents. You can even store them in different
> categories to simulate "filters".
>
> On 16 feb, 22:44, theone <[email protected]> wrote:
>
>
>
>
>
>
>
> > Do you mean that I cannot convert and store word document contents
> > with google docs api in a cheap way?
>
> > On 16 Şubat, 22:00, "Brandon Wirtz" <[email protected]> wrote:
>
> > > You would need to convert the word doc to text so that it is searchable.
> > > It
> > > is not an issues of Storing, but conversion. I believe there is enough
> > > google docs API that you can upload to Docs, save to text or RTF (which
> > > you
> > > could parse) and then search.
>
> > > The timing all of these things may not work since each of the steps will
> > > take several second, so you may find that you can't complete an operation
> > > before GAE gives up, or that all that waiting burns through your budget
> > > quickly.
>
> > > -----Original Message-----
> > > From: [email protected]
>
> > > [mailto:[email protected]] On Behalf Of theone
> > > Sent: Wednesday, February 16, 2011 7:16 AM
> > > To: Google App Engine
> > > Subject: [google-appengine] Storing word document contents in
> > > datastore(Python)
>
> > > Hi,
>
> > > I want to get word documents from users and keep their content in
> > > datastore
> > > to make search in those.
> > > I made lots of search on the internet but there are only solutions with
> > > third party libraries such as open office api(to export word
> > > documents) To use open office api, we need to install open office and we
> > > do
> > > not have chance to install it to google app engine if I am not wrong.
>
> > > Is there anyway to do that?
> > > Thanks in advance.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group
> > > athttp://groups.google.com/group/google-appengine?hl=en.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.