You have NO IDEA what a can of worms this is!

The problem is, real-world HTML is simply NOT STANDARD. That is,
there's a standard, it's kind of loose, and people violate it all the
time. Browsers understand this, and have VERY forgiving parsers.

But a good forgiving parser is a lot harder to write than one that
follows a standard. And there's a whole lot of different ways that it
could work.

If you google, there ARE a number of Java-based HTML parsers out
there. It's been a long time since I've used one. I have, on occasion,
had to write my own.

The first thing is to ask "Why do you want to parse HTML, rather than
XML?". That leads to "What kind of HTML do I have to parse? How many
kinds?"

If you just want to extract some bits of information, you may be able
to do that with a few well-chosen regular expressions.

Or at the other extreme, you may have a $1 Million engineering project
on your hand.

The BEST option, is to avoid parsing HTML if at all possible.
Otherwise, the more narrow your expectations of what you want to get
from the HTML, the easier it will be to find, adapt, or write a parser
to meet your needs.

On Feb 21, 9:54 pm, Alisha <[email protected]> wrote:
> Hi All,
>
> I have to parse a html file using java. I have gone through a lot of
> html parsers, but seem to understand none of them. So please help me
> out with the type of parser that should be used for an android app and
> how to parse a  html file.

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to