Re: [Haskell-cafe] TagSoup 0.9

2010-05-25 Thread Neil Mitchell
Hi,

From what I can tell of your example you've managed to get the raw
HTTP response in Unicode, which isn't suitable for sending to tagsoup.
I've not used the Network.HTTP library for downloading much, but when
I did I thought it stripped the headers automatically.

Can you just print the first few lines of the output you get from the
HTTP library, without passing them through tagsoup. That should show
the problem independent of tagsoup.

Thanks, Neil


On Mon, May 24, 2010 at 3:24 AM, Ralph Hodgson rhodg...@topquadrant.com wrote:
 Thanks Neil,



 Using Network.HTTP worked.



 However something else I have just run into concerns some web pages that
 start with:



 ?xml version=1.0 encoding=iso-8859-1?

 !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
 http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;



 I get the following bad result:



 TagText HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nLast-Modified: Tue,
 27 Oct 2009 19:30:40 GMT\r\nETag: \6f248cf73b57ca1:25e2\\r\nDate: Sun, 23
 May 2010 22:46:41 GMT\r\nTransfer-Encoding:  chunked\r\nConnection:
 close\r\nConnection:
 Transfer-Encoding\r\n\r\n4000\r\n\255\254\NUL?\NULx\NULm\NULl\NUL
 \NULv\NULe\NULr\NULs\NULi\NULo\NULn\NUL=\NUL\\NUL1\NUL.\NUL0\NUL\\NUL
 \NULe\NULn\NULc\NULo\NULd\NULi\NULn\NULg\NUL=\NUL\\NULi\NULs\NULo\NUL-\NUL8\NUL8\NUL5\NUL9\NUL-\NUL1\NUL\\NUL



 etc etc



 Is this an easy thing to fix? I've started to look over the code.



 -Original Message-
 From: Neil Mitchell [mailto:ndmitch...@gmail.com]
 Sent: Wednesday, May 19, 2010 12:19 PM
 To: Ralph Hodgson
 Cc: Daniel Fischer; haskell-cafe@haskell.org; Don Stewart
 Subject: Re: [Haskell-cafe] TagSoup 0.9



 Hi Ralph,



 I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have
 this error:



 TQ\TagSoup\TagSoupExtensions.lhs:29:17:

    `Tag' is not applied to enough type arguments

    Expected kind `*', but `Tag' has kind `* - *'

    In the type synonym declaration for `Bundle'

 Failed, modules loaded: TQ.Common.TextAndListHandling.



 My change notes have this being a change between 0.6 and 0.8. As

 Malcolm says, any old uses of Tag should become Tag String. The

 reason is that Tag is now parameterised, and you can use Tag

 ByteString etc. However, I should point out that Tag ByteString won't

 be any faster than Tag String in this version (it's in the future work

 pile).



  Forgot to add: I now need to understand the following warnings on this

  line  import Text.HTML.Download:



 Everyone's comments have been right. I previously included

 Text.HTML.Download so that it was easy to test tagsoup against the

 web. Since I first wrote that snippet the HTTP downloading libraries

 have improved substantially, so people should use those in favour of

 the version in tagsoup - you'll be able to connect to more websites in

 more reliable ways, go through proxies etc. I don't intend to remove

 the Download module any time soon, but I will do eventually.



 Thanks, Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] TagSoup 0.9

2010-05-23 Thread Ralph Hodgson
Thanks Neil,

 

Using Network.HTTP worked.

 

However something else I have just run into concerns some web pages that
start with:

 

?xml version=1.0 encoding=iso-8859-1?

!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;

 

I get the following bad result:

 

TagText HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nLast-Modified: Tue,
27 Oct 2009 19:30:40 GMT\r\nETag: \6f248cf73b57ca1:25e2\\r\nDate: Sun, 23
May 2010 22:46:41 GMT\r\nTransfer-Encoding:  chunked\r\nConnection:
close\r\nConnection:
Transfer-Encoding\r\n\r\n4000\r\n\255\254\NUL?\NULx\NULm\NULl\NUL
\NULv\NULe\NULr\NULs\NULi\NULo\NULn\NUL=\NUL\\NUL1\NUL.\NUL0\NUL\\NUL
\NULe\NULn\NULc\NULo\NULd\NULi\NULn\NULg\NUL=\NUL\\NULi\NULs\NULo\NUL-\NUL8
\NUL8\NUL5\NUL9\NUL-\NUL1\NUL\\NUL

 

etc etc

 

Is this an easy thing to fix? I've started to look over the code.

 

-Original Message-
From: Neil Mitchell [mailto:ndmitch...@gmail.com] 
Sent: Wednesday, May 19, 2010 12:19 PM
To: Ralph Hodgson
Cc: Daniel Fischer; haskell-cafe@haskell.org; Don Stewart
Subject: Re: [Haskell-cafe] TagSoup 0.9

 

Hi Ralph,

 

 I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have
this error:

 

 TQ\TagSoup\TagSoupExtensions.lhs:29:17:

`Tag' is not applied to enough type arguments

Expected kind `*', but `Tag' has kind `* - *'

In the type synonym declaration for `Bundle'

 Failed, modules loaded: TQ.Common.TextAndListHandling.

 

My change notes have this being a change between 0.6 and 0.8. As

Malcolm says, any old uses of Tag should become Tag String. The

reason is that Tag is now parameterised, and you can use Tag

ByteString etc. However, I should point out that Tag ByteString won't

be any faster than Tag String in this version (it's in the future work

pile).

 

  Forgot to add: I now need to understand the following warnings on this

  line  import Text.HTML.Download:

 

Everyone's comments have been right. I previously included

Text.HTML.Download so that it was easy to test tagsoup against the

web. Since I first wrote that snippet the HTTP downloading libraries

have improved substantially, so people should use those in favour of

the version in tagsoup - you'll be able to connect to more websites in

more reliable ways, go through proxies etc. I don't intend to remove

the Download module any time soon, but I will do eventually.

 

Thanks, Neil

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Ralph Hodgson
Hello Neil ,

 

I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this
error:

 

TQ\TagSoup\TagSoupExtensions.lhs:29:17:

`Tag' is not applied to enough type arguments

Expected kind `*', but `Tag' has kind `* - *'

In the type synonym declaration for `Bundle'

Failed, modules loaded: TQ.Common.TextAndListHandling.

 

where line 29 is the type declaration for 'bundle' in the following code:

 

 module TQ.TagSoup.TagSoupExtensions where 

 

 import TQ.Common.TextAndListHandling

 import Text.HTML.TagSoup

 import Text.HTML.Download

 import Control.Monad

 import Data.List

 import Data.Char

 

 type Bundle = [Tag]

 

[snip]

 

 tagsOnPage :: String - IO(String)

 tagsOnPage url = do

  tags - liftM parseTags $ openURL url

  let results = unlines $ map(show) $ tags

  return (results)

 

 extractTags :: Tag - Tag - [Tag] - [Tag]

 extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/=
fromTag ) tags 

 

 extractTagsBetween ::  Tag - [Tag] - [Tag]

 extractTagsBetween _ [] = []

 extractTagsBetween markerTag tags = if startTags == []

  then []

  else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail
startTags) 

  where

startTags = dropWhile (~/= markerTag ) tags

 

I need to repair this code quickly. I am hoping you can quickly help me
resolve this. Thanks.

 

Ralph Hodgson, 

@ralphtq http://twitter.com/ralphtq 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Malcolm Wallace
Neil says that the API of TagSoup changed in 0.9.
All usages of the type Tag should now take a type argument, e.g. Tag String.


Regards,
Malcolm

 
On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson 
rhodg...@topquadrant.com wrote:
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Ralph Hodgson
Thanks Malcolm,

 

Providing a 'String' type argument worked:

 

 type Bundle = [Tag String]

 

 extractTags :: Tag String - Tag String - Bundle - Bundle

 extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= 
 fromTag ) tags

 

 

 

From: Malcolm Wallace [mailto:malcolm.wall...@me.com] 
Sent: Wednesday, May 19, 2010 1:48 AM
To: rhodg...@topquadrant.com
Cc: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] TagSoup 0.9

 

Neil says that the API of TagSoup changed in 0.9.
All usages of the type Tag should now take a type argument, e.g. Tag String.
 
 
Regards,
Malcolm
 
 
On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson 
rhodg...@topquadrant.com wrote:
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 

Hello Neil ,

 

I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this 
error:

 

TQ\TagSoup\TagSoupExtensions.lhs:29:17:

`Tag' is not applied to enough type arguments

Expected kind `*', but `Tag' has kind `* - *'

In the type synonym declaration for `Bundle'

Failed, modules loaded: TQ.Common.TextAndListHandling.

 

where line 29 is the type declaration for 'bundle' in the following code:

 

 module TQ.TagSoup.TagSoupExtensions where 

 

 import TQ.Common.TextAndListHandling

 import Text.HTML.TagSoup

 import Text.HTML.Download

 import Control.Monad

 import Data.List

 import Data.Char

 

 type Bundle = [Tag]

 

[snip]

 

 tagsOnPage :: String - IO(String)

 tagsOnPage url = do

  tags - liftM parseTags $ openURL url

  let results = unlines $ map(show) $ tags

  return (results)

 

 extractTags :: Tag - Tag - [Tag] - [Tag]

 extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= 
 fromTag ) tags 

 

 extractTagsBetween ::  Tag - [Tag] - [Tag]

 extractTagsBetween _ [] = []

 extractTagsBetween markerTag tags = if startTags == []

  then []

  else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail 
 startTags) 

  where

startTags = dropWhile (~/= markerTag ) tags

 

I need to repair this code quickly. I am hoping you can quickly help me resolve 
this. Thanks.

 

Ralph Hodgson, 

@ralphtq http://twitter.com/ralphtq 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Ralph Hodgson
Forgot to add: I now need to understand the following warnings on this line  
import Text.HTML.Download:

 

TagSoupExtensions.lhs:24:2:

Warning: In the use of `openItem'

 (imported from Text.HTML.Download):

 Deprecated: Use package HTTP, module Network.HTTP, getResponseBody

 = simpleHTTP (getRequest url)

 

TagSoupExtensions.lhs:24:2:

Warning: In the use of `openURL'

 (imported from Text.HTML.Download):

 Deprecated: Use package HTTP, module Network.HTTP, getResponseBody

 = simpleHTTP (getRequest url)

Ok, modules loaded: TQ.TagSoup.TagSoupExtensions.

*TQ.TagSoup.TagSoupExtensions

 

 

From: haskell-cafe-boun...@haskell.org 
[mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Ralph Hodgson
Sent: Wednesday, May 19, 2010 10:30 AM
To: 'Malcolm Wallace'
Cc: haskell-cafe@haskell.org
Subject: RE: [Haskell-cafe] TagSoup 0.9

 

Thanks Malcolm,

 

Providing a 'String' type argument worked:

 

 type Bundle = [Tag String]

 

 extractTags :: Tag String - Tag String - Bundle - Bundle

 extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= 
 fromTag ) tags

 

 

 

From: Malcolm Wallace [mailto:malcolm.wall...@me.com] 
Sent: Wednesday, May 19, 2010 1:48 AM
To: rhodg...@topquadrant.com
Cc: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] TagSoup 0.9

 

Neil says that the API of TagSoup changed in 0.9.
All usages of the type Tag should now take a type argument, e.g. Tag String.
 
 
Regards,
Malcolm
 
 
On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson 
rhodg...@topquadrant.com wrote:
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 

Hello Neil ,

 

I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this 
error:

 

TQ\TagSoup\TagSoupExtensions.lhs:29:17:

`Tag' is not applied to enough type arguments

Expected kind `*', but `Tag' has kind `* - *'

In the type synonym declaration for `Bundle'

Failed, modules loaded: TQ.Common.TextAndListHandling.

 

where line 29 is the type declaration for 'bundle' in the following code:

 

 module TQ.TagSoup.TagSoupExtensions where 

 

 import TQ.Common.TextAndListHandling

 import Text.HTML.TagSoup

 import Text.HTML.Download

 import Control.Monad

 import Data.List

 import Data.Char

 

 type Bundle = [Tag]

 

[snip]

 

 tagsOnPage :: String - IO(String)

 tagsOnPage url = do

  tags - liftM parseTags $ openURL url

  let results = unlines $ map(show) $ tags

  return (results)

 

 extractTags :: Tag - Tag - [Tag] - [Tag]

 extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= 
 fromTag ) tags 

 

 extractTagsBetween ::  Tag - [Tag] - [Tag]

 extractTagsBetween _ [] = []

 extractTagsBetween markerTag tags = if startTags == []

  then []

  else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail 
 startTags) 

  where

startTags = dropWhile (~/= markerTag ) tags

 

I need to repair this code quickly. I am hoping you can quickly help me resolve 
this. Thanks.

 

Ralph Hodgson, 

@ralphtq http://twitter.com/ralphtq 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Daniel Fischer
On Wednesday 19 May 2010 19:46:57, Ralph Hodgson wrote:
 Forgot to add: I now need to understand the following warnings on this
 line  import Text.HTML.Download:



In Text.HTML.Download, there's the following:

{-|
/DEPRECATED/: Use the HTTP package instead:

 import Network.HTTP
 openURL x = getResponseBody = simpleHTTP (getRequest x)

This module simply downloads a page off the internet. It is very 
restricted,
and it not intended for proper use.

The original version was by Alistair Bayley, with additional help from
Daniel McAllansmith. It is taken from the Haskell-Cafe mailing list
\Simple HTTP lib for Windows?\, 18 Jan 2007.
http://thread.gmane.org/gmane.comp.lang.haskell.cafe/18443/
-}

and

{-# DEPRECATED openItem, openURL Use package HTTP, module Network.HTTP, 
getResponseBody = simpleHTTP (getRequest url) #-}


So, don't use Text.HTML.Download anymore, instead use the functions from 
the HTTP package.

Deprecated stuff will probably be removed in one of the next releases.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Don Stewart
Or use things from the download-curl package, which provides a nice
openURL function.

daniel.is.fischer:
 On Wednesday 19 May 2010 19:46:57, Ralph Hodgson wrote:
  Forgot to add: I now need to understand the following warnings on this
  line  import Text.HTML.Download:
 
 
 
 In Text.HTML.Download, there's the following:
 
 {-|
 /DEPRECATED/: Use the HTTP package instead:
 
  import Network.HTTP
  openURL x = getResponseBody = simpleHTTP (getRequest x)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Neil Mitchell
Hi Ralph,

 I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this 
 error:

 TQ\TagSoup\TagSoupExtensions.lhs:29:17:
`Tag' is not applied to enough type arguments
Expected kind `*', but `Tag' has kind `* - *'
In the type synonym declaration for `Bundle'
 Failed, modules loaded: TQ.Common.TextAndListHandling.

My change notes have this being a change between 0.6 and 0.8. As
Malcolm says, any old uses of Tag should become Tag String. The
reason is that Tag is now parameterised, and you can use Tag
ByteString etc. However, I should point out that Tag ByteString won't
be any faster than Tag String in this version (it's in the future work
pile).

  Forgot to add: I now need to understand the following warnings on this
  line  import Text.HTML.Download:

Everyone's comments have been right. I previously included
Text.HTML.Download so that it was easy to test tagsoup against the
web. Since I first wrote that snippet the HTTP downloading libraries
have improved substantially, so people should use those in favour of
the version in tagsoup - you'll be able to connect to more websites in
more reliable ways, go through proxies etc. I don't intend to remove
the Download module any time soon, but I will do eventually.

Thanks, Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Henning Thielemann
Don Stewart schrieb:
 Or use things from the download-curl package, which provides a nice
 openURL function.

The openURL function from TagSoup is lazy, which the proposed
replacement 'getResponseBody = simpleHTTP (getRequest x)' is not. Is
the openURL function from download-curl lazy?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] TagSoup 0.9

2010-05-19 Thread Don Stewart
schlepptop:
 Don Stewart schrieb:
  Or use things from the download-curl package, which provides a nice
  openURL function.
 
 The openURL function from TagSoup is lazy, which the proposed
 replacement 'getResponseBody = simpleHTTP (getRequest x)' is not. Is
 the openURL function from download-curl lazy?
 

Yes, see:

Network.Curl.Download.Lazy.openLazyURI

though I think it is possible that I strictified the code. Have a play
around with it if it doesn't meet your needs -- should be /trivial/ to
ensure it is chunk-wise lazy.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe