On 2019-01-03 6:00 a.m., Christian Grün wrote:
If you use Java, there is quite a variety on running queries. Maybe
you could give us some insight into your use case first? For example,
what do you want to do with the result?

Yes, bit spaghetti-ish, pardon. The notion is to first drop the database, then populate, then query. For grabbing xml from w3schools, popping in a database, running an xquery, that works fine.

Moving to html, it then sortof works. The db is dropped, a db is created and then populated. Browsing in the GUI I can see, for example, a list of book categories -- so there's data to work from. (Which tagsoup has fixed so that basex can parse it.)

That's really the end goal:  just running XQuery against html.

The only query I can get working against the html is for the query string to be "text()" or perhaps "/text()" which then returns all the html. Rather, I'd want to traverse to pick out specific parts.

It's related, to a degree, with Selenium efforts.

---

The upshot being that the way tagsoup fixes malformed html either causes (me) problems with running xquery queries, or, more likely, I'm not understanding how to run xpath and xquery against the db properly.

The GUI is very interesting in this respect because it allows me to visualize the raw data, it's "clickable", and I can run type xpath queries right in the GUI.

However, the *only* xpath query I can get results on is "text()". Not so with "raw" xml from w3schools. With that xml I can drill down to varying degrees as expected.

-------

Either tagsoup is mashing the html too extremely, or it's my lack of knowledge.



Hey, I appreciate the input.  Hope I made sense.


-Thufir

Reply via email to