Re: [Haskell-cafe] Message

2011-10-21 Thread Matti Oinas
I don't think I'm going to write next twitter or facebook but yes, it
is on my TODO list. If such an applications can be written with
languages like PHP then why not. Can't think of any language that is
worse than PHP but still there are lots of web applications written
with that. Even I have written many using PHP.

Why I would use Haskell? To see if it is better option to that problem
than other languages.

I have allready installed Yesod but for now I don't have enough time
to work on this project. After 6 months the situation should be
different.

2011/10/21 Michael Snoyman mich...@snoyman.com:
 This is clearly a job for node.js and the /dev/null data store, since
 they are so web scale~

 Less sarcasm: I think any of the main Haskell web frameworks (Yesod,
 Happstack, Snap) could scale better than Ruby or PHP, and would use
 any of those in a heartbeat for such a venture. I'd personally use
 Yesod.

 I think data store would be a trickier issue. I'd likely use one of
 the key/value stores out there, possibly Redis, though I'd really need
 to do more research to give a real answer.

 Michael

 On Fri, Oct 21, 2011 at 9:42 AM, Yves Parès limestr...@gmail.com wrote:
 Wow, controversial point I guess...
 I would add: and if yes, what would you use and why?

 2011/10/21 Goutam Tmv vo1d_poin...@live.com

 Would you ever see yourself write a web application like Twitter or
 Facebook in Haskell?

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
/***/

try {
   log.trace(Id= + request.getUser().getId() +  accesses  +
manager.getPage().getUrl().toString())
} catch(NullPointerException e) {}

/***/

This is a real code, but please make the world a bit better place and
don’t do it, ever.

* http://www.javacodegeeks.com/2011/01/10-tips-proper-application-logging.html *

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Converting wiki pages into pdf

2011-09-08 Thread Matti Oinas
The whole wikipedia database can also be downloaded if that is any help.

http://en.wikipedia.org/wiki/Wikipedia:Database_download

There is also text in that site saying Please do not use a web
crawler to download large numbers of articles. Aggressive crawling of
the server can cause a dramatic slow-down of Wikipedia.

Matti

2011/9/9 Kyle Murphy orc...@gmail.com:
 It's worth pointing out at this point (as alluded to by Conrad) that what
 you're attempting might be considered somewhat rude, and possibly slightly
 illegal (depending on the insanity of the legal system in question).
 Automated site scraping (what you're essentially doing) is generally frowned
 upon by most hosts unless it follows some very specific guidelines, usually
 at a minimum respecting the restrictions specified in the robots.txt file
 contained in the domains root. Furthermore, depending on the type of data in
 question, and if a EULA was agreed to if the site requires an account, doing
 any kind of automated processing might be disallowed. Now, I think wikipedia
 has a fairly lenient policy, or at least I hope it does considering it's
 community driven, but depending on how much of wikipedia you're planning on
 crawling you might at the very least consider severly throttling the process
 to keep from sucking up too much bandwidth.

 On the topic of how to actually perform that crawl, you should probably
 check out the format of the link provided in the download PDF element. After
 looking at an article (note, I'm basing this off a quick glance at a single
 page) it looks like you should be able to modify the URL provided in the
 Permanent link element to generate the PDF link by changing the title
 argument to arttitle, adding a new title argument with the value
 Special:Book, and adding the new arguments bookcmd=render_article and
 writer=rl. For example if the permanent link to the article is:

 http://en.wikipedia.org/w/index.php?title=Shapinsayoldid=449266269

 Then the PDF URL is:

 http://en.wikipedia.org/w/index.php?arttitle=Shapinsayoldid=449266269title=Special:Bookbookcmd=render_articlewrite=rl

 This is all rather hacky as well, and none of it has been tested so it might
 not actually work, although I see no reason why it shouldn't. It's also
 fragile, as if wikipedia changes just about anything it could all brake, but
 that's the risk you run anytime you resort of site scraping.

 -R. Kyle Murphy
 --
 Curiosity was framed, Ignorance killed the cat.


 On Thu, Sep 8, 2011 at 23:40, Conrad Parker con...@metadecks.org wrote:

 On Sep 9, 2011 7:33 AM, mukesh tiwari mukeshtiwari.ii...@gmail.com
 wrote:
 
  Thank your for reply Daniel. Considering my limited knowledge of web
  programming and javascript , first i need to simulated the some sort of
  browser in my program which will run the javascript and will generate the
  pdf. After that i can download the pdf . Is this you mean ?  Is
  Network.Browser any helpful for this purpose ? Is there  way to solve this
  problem ?
  Sorry for  many questions but this  is my first web application program
  and i am trying hard to finish it.
 

 Have you tried finding out if simple URLs exist for this, that don't
 require Javascript? Does Wikipedia have a policy on this?

 Conrad.

 
  On Fri, Sep 9, 2011 at 4:17 AM, Daniel Patterson
  lists.hask...@dbp.mm.st wrote:
 
  It looks to me that the link is generated by javascript, so unless you
  can script an actual browser into the loop, it may not be a viable 
  approach.
 
  On Sep 8, 2011, at 3:57 PM, mukesh tiwari wrote:
 
   I tried to use the PDF-generation facilities . I wrote a script which
   generates the rendering url . When i am pasting rendering url in
   browser its generating the download file but when i am trying to get
   the tags , its empty. Could some one please tell me what is wrong
   with
   code.
   Thank You
   Mukesh Tiwari
  
   import Network.HTTP
   import Text.HTML.TagSoup
   import Data.Maybe
  
   parseHelp :: Tag String - Maybe String
   parseHelp ( TagOpen _ y ) = if ( filter ( \( a , b ) - b ==
   Download
   a PDF version of this wiki page ) y )  /= []
                              then Just $  http://en.wikipedia.org; ++
    ( snd $
   y !!  0 )
                               else Nothing
  
  
   parse :: [ Tag String ] - Maybe String
   parse [] = Nothing
   parse ( x : xs )
     | isTagOpen x = case parseHelp x of
                          Just s - Just s
                          Nothing - parse xs
     | otherwise = parse xs
  
  
   main = do
         x - getLine
         tags_1 -  fmap parseTags $ getResponseBody = simpleHTTP
   ( getRequest x ) --open url
         let lst =  head . sections ( ~== div class=portal id=p-coll-
   print_export ) $ tags_1
             url =  fromJust . parse $ lst  --rendering url
         putStrLn url
         tags_2 -  fmap parseTags $ getResponseBody = simpleHTTP
   ( getRequest url )
         print tags_2