Juan,
You wrote:
-> I would like to work with XML, what is the first step?
-> where can i find examples or tutorials??
You may wish to check out www.apache.org and download
their XERCES ( named after a butterfly, which has an X in
it's name ) Java based XML Parser technologies, which includes
support for SAX and DOM, I believe the most current release
revisions of these spec's.
Another good source of learning is Jason Hunters book on
Java and XML ( from O'Reilley I believe. ) Some good simple
examples of use and explanations of the reasons behind the
designs and best practice recommendations. He has a web
site as well, but I do not recall it off the top of my head, maybe
someone on distribution will chime in and enlighten us.
----- Original Message -----
From: "Orozco, Juan Carlos" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 09, 2001 11:17 AM
Subject: JSP and XML
> I would like to work with XML, what is the first step?
> where can i find examples or tutorials??
>
> regards,
>
> Juan Orozco
> -----Original Message-----
> From: Sylvain Roche [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, May 09, 2001 9:58 AM
> To: [EMAIL PROTECTED]
> Subject: PDF indexing
>
>
> Hi
>
> I've been working on an indexing engine for a while. Everything works fine
> with static or dynamic html. I would like now to be able to retrieve
> informations in pdf files. I've found several apis to write dynamic pdf
> documents, but no simple one to parse the document, and extract the text
> content.
>
> For now, my indexing engine works this way :
> 1) download a page with a starting url
> 2) parse the content of the document to extract the headers and meta tags
> 3) analyse all the html tags of the page (links, colors, simple forms with
> no user input)
> 4) generate a list of urls referenced in the current page, which are
queued
> 5) extract the text content and eliminate the neglectable words, and
stores
> it in a database
> 6) pursue with new url in the queue
>
> My concern is that a pdf document will be used to store a much larger
> amount of data than a html page. I expect this scheme to be very long with
> a pdf. Has anyone any experience on such an search engine, and am I in the
> right direction
>
> Regards
> Sylvain
>
>
===========================================================================
> To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff
> JSP-INTEREST".
> For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST
> DIGEST".
> Some relevant FAQs on JSP/Servlets can be found at:
>
> http://java.sun.com/products/jsp/faq.html
> http://www.esperanto.org.nz/jsp/jspfaq.html
> http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
> http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets
>
>
===========================================================================
> To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff
JSP-INTEREST".
> For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST
DIGEST".
> Some relevant FAQs on JSP/Servlets can be found at:
>
> http://java.sun.com/products/jsp/faq.html
> http://www.esperanto.org.nz/jsp/jspfaq.html
> http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
> http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets
>
===========================================================================
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:
http://java.sun.com/products/jsp/faq.html
http://www.esperanto.org.nz/jsp/jspfaq.html
http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets