Re: [Haskell-cafe] HXT: how to get sibling element
Thanx to all. I've done it! === import Text.XML.HXT.Core import Text.XML.HXT.Curl import Text.XML.HXT.HTTP import Control.Arrow.ArrowNavigatableTree pageURL = http://localhost/test.xml; main = do r - runX (configSysVars [withCanonicalize no, withValidate no, withTrace 0, withParseHTML no] readDocument [withErrors no, withWarnings no, withHTTP []] pageURL getChildren isElem hasName div (getTitle + getSections)) putStrLn Articles: putStrLn mapM_ putStrLn $ map (\i - (fst i) ++ is ++ (snd i) ++ \n) r putStrLn getTitle = listA (getChildren isElem hasName span) arr head getChildren getText arr trim arr (Title,) getSections = addNav listA (getChildren withoutNav (isElem hasName span)) arr tail unlistA ((getChildren remNav getText) (listA followingSiblingAxis arr head remNav getText arr (rc . trim))) ltrim [] = [] ltrim (' ':x) = ltrim x ltrim ('\n':x) = ltrim x ltrim ('\r':x) = ltrim x ltrim ('\t':x) = ltrim x ltrim x = x rtrim = reverse . ltrim . reverse trim = ltrim . rtrim rc (':':' ':x) = x rc x = x == ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] HXT: how to get sibling element
Hello, haskellers. Suppose we have this xml doc (maybe, little stupid): div spanSome story/span spanDescription/span: This story about... spanAuthor/span: Tom Smith /div In the end I whant to get list: [(Title, Some story), (Description,This story about...), (Author, Tom Smith)], or, maybe this: Book Some story [(description,This story about...), (Author, Tom Smith)] (Book = Book String [(String, String)]. First span is a special case then others and I undestand how to process it: === import Text.XML.HXT.Core import Text.XML.HXT.Curl import Text.XML.HXT.HTTP pageURL = http://localhost/test.xml; main = do r - runX (configSysVars [withCanonicalize no, withValidate no, withTrace 0, withParseHTML no] readDocument [withErrors no, withWarnings no, withHTTP []] pageURL getChildren isElem hasName div listA (getChildren hasName span) getTitle + getSections) putStrLn Статьи: putStr mapM_ putStr $ map (\i - (fst i) ++ : ++ (snd i) ++ | ) r putStrLn getTitle = arr head getChildren getText arr trim arr (Title,) getSections = arr tail unlistA ((getChildren getText arr trim) (getChildren getText arr trim)) ltrim [] = [] ltrim (' ':x) = ltrim x ltrim ('\n':x) = ltrim x ltrim ('\r':x) = ltrim x ltrim ('\t':x) = ltrim x ltrim x = x rtrim = reverse . ltrim . reverse trim = ltrim . rtrim === And I' get list: [(Title, Some story), (Description,Description), (Author, Author)] (Maybe, there is a better way to get this list?) But I cannot find a way to get text that followes some span. I suppose that I have to use function from Data.Tree.NavigatableTree.XPathAxis, but I don't puzzle out how to do it. Please, help me. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] HXT: how to get sibling element
I'm am not really familiar with XML parsing in Haskell, but I am wondering why, if you have an xml file, you not simply name the element after the type of contents: book titleSome Story/title descriptionThis story about../description authorTom Smith/author /book Alternatively you could use the class or some type attribute to indicate the type. Regards, Wilfried van Asten 2012/3/15 Никитин Лев leon.v.niki...@pravmail.ru: Hello, haskellers. Suppose we have this xml doc (maybe, little stupid): div spanSome story/span spanDescription/span: This story about... spanAuthor/span: Tom Smith /div In the end I whant to get list: [(Title, Some story), (Description,This story about...), (Author, Tom Smith)], or, maybe this: Book Some story [(description,This story about...), (Author, Tom Smith)] (Book = Book String [(String, String)]. First span is a special case then others and I undestand how to process it: === import Text.XML.HXT.Core import Text.XML.HXT.Curl import Text.XML.HXT.HTTP pageURL = http://localhost/test.xml; main = do r - runX (configSysVars [withCanonicalize no, withValidate no, withTrace 0, withParseHTML no] readDocument [withErrors no, withWarnings no, withHTTP []] pageURL getChildren isElem hasName div listA (getChildren hasName span) getTitle + getSections) putStrLn Статьи: putStr mapM_ putStr $ map (\i - (fst i) ++ : ++ (snd i) ++ | ) r putStrLn getTitle = arr head getChildren getText arr trim arr (Title,) getSections = arr tail unlistA ((getChildren getText arr trim) (getChildren getText arr trim)) ltrim [] = [] ltrim (' ':x) = ltrim x ltrim ('\n':x) = ltrim x ltrim ('\r':x) = ltrim x ltrim ('\t':x) = ltrim x ltrim x = x rtrim = reverse . ltrim . reverse trim = ltrim . rtrim === And I' get list: [(Title, Some story), (Description,Description), (Author, Author)] (Maybe, there is a better way to get this list?) But I cannot find a way to get text that followes some span. I suppose that I have to use function from Data.Tree.NavigatableTree.XPathAxis, but I don't puzzle out how to do it. Please, help me. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] HXT: how to get sibling element
I absolutly agree with you but unfortunetly, it is not my xml file.It is extraction from html page of public web server. I cannot to change format of this html page.Sorry. I had to explain it in first letter. But than what about to get sibling text (geting sibling is an separate interesting tasks with no matter for my contrete case). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] HXT: how to get sibling element
Oh, yes! In this situation with so poor structured source I can try to use tagsoup. (or I'll take a look at xml-conduit). Nevertheless for better undestanding HXT it will be interesting to solve this problem in HXT. Or is it impossible? 15.03.2012, 20:08, Asten, W.G.G. van (Wilfried, Student B-TI) w.g.g.vanas...@student.utwente.nl: You might want to check out the xml-conduit package. It has preceding and following sibling Axis. I am not sure how the package works exactly, but it seems to be a good starting point. 2012/3/15 Никитин Лев leon.v.niki...@pravmail.ru: I absolutly agree with you but unfortunetly, it is not my xml file. It is extraction from html page of public web server. I cannot to change format of this html page. Sorry. I had to explain it in first letter. But than what about to get sibling text (geting sibling is an separate interesting tasks with no matter for my contrete case). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] HXT: how to get sibling element
ArrowNavigatableTree can also get a following sibling Axis: http://hackage.haskell.org/packages/archive/hxt/9.2.2/doc/html/Control-Arrow-ArrowNavigatableTree.html Wilfried 2012/3/15 Никитин Лев leon.v.niki...@pravmail.ru: Oh, yes! In this situation with so poor structured source I can try to use tagsoup. (or I'll take a look at xml-conduit). Nevertheless for better undestanding HXT it will be interesting to solve this problem in HXT. Or is it impossible? 15.03.2012, 20:08, Asten, W.G.G. van (Wilfried, Student B-TI) w.g.g.vanas...@student.utwente.nl: You might want to check out the xml-conduit package. It has preceding and following sibling Axis. I am not sure how the package works exactly, but it seems to be a good starting point. 2012/3/15 Никитин Лев leon.v.niki...@pravmail.ru: I absolutly agree with you but unfortunetly, it is not my xml file. It is extraction from html page of public web server. I cannot to change format of this html page. Sorry. I had to explain it in first letter. But than what about to get sibling text (geting sibling is an separate interesting tasks with no matter for my contrete case). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe