I second what Colin says "*Since you're scraping anyway, consider just scraping the results of the rendered page? This will likely take substantially more CPU time, but ridiculously less developer time to implement."*
GWT RPC is not an API. It will constantly change as the website updates. I'd recommend either using a proper API (if one exists), or an off-the-shelf scraper tool. On Wednesday 9 October 2024 at 4:40:08 am UTC+11 Colin Alworth wrote: > I'd suggest reading the stream reader/write subtypes of > AbstractSerializationStream to understand what all of the values are for - > in short, a gwt-rpc response is a payload and a string table, and the > payloads elements will reference the string table. You cannot know what the > structure is for certain without seeing the original Java types being > serialized, but often you can make good guesses. > > I'd also suggest reading stackoverflow posts and the like showing how to > deserialize other payloads just from context - here's a post that breaks > down a payload to understand its contents: > https://stackoverflow.com/questions/35047102/serializing-rpc-gwt/35047887#35047887 > > If you havent yet, read > https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit > > as well. > > In short though, your response value is _probably_ be a List of > CourseMember types - knowing that class will help you. I can't easily guess > more though, as the above doc says, the json array is read backwards, so > the important details would be right before and after the string array - > you have 1,7,2,1[...strings...] in the second image. From that I can say > > 1: if this was zero, it would be a null, since it is a positive number, > read the (value - 1) entry from the string table, which is ArrayList, so: > read a value of type ArrayList from the stream > 7: the ArrayList has 7 items > 2: first item in the arraylist - as above, if this was 0, it would be > null, since it is positive, read the (value - 1) entry from the string > table, and decode that type, so: read a CourseMember object from the payload > 1: this is _probably_ the number 1 in the first field of the first > CourseMember. > ... > > A parser continuing in this way, with knowledge of the structure of these > types could be written to decode this payload. I don't know of an > off-the-shelf tool that will do it for you in a truly automated way, but > could consult to write one, or guide your project in implementing one by > hand. > > Since you're scraping anyway, consider just scraping the results of the > rendered page? This will likely take substantially more CPU time, but > ridiculously less developer time to implement. > > On Tuesday, October 8, 2024 at 12:03:30 PM UTC-5 [email protected] wrote: > >> Thank you because the detail response. >> >> I want to crawl data on a public website, I opened devtools and saw that >> it was written by GWT RPC. >> >> This is the body of request I saw: >> >> 7|0|10| >> https://a.b.c.d/e|5C6CDB13D0FD25B266F3C36FA7FF6ED9|a1.a2.a3.DataService|getCourseMembers|java.lang.Long/4227064769|java.lang.String/2004016611|java.util.List|20204524|java.util.Arrays$ArrayList/2507071751|20241|1|2|3|4|3|5|6|7|5|TXbrzIAAA|8|9|1|6|10| >> >> As you can see, no problem with that syntax, I can understand roughly, I >> know the method is getCourseMembers. I want to build a function should >> return above body, like: >> public static String getBodyEncoded(String methodName, >> ... String methodBody ...) or something similar, and return the body above >> to send to server. >> >> I also want to know the last past of request syntax: >> 1|2|3|4|1|5|6|7|7|8|7|9|7|10|7|11|7|12|7|13|7|14| >> >> The next is the response body. This is really the problem. A response is >> very long, I put it in attached files. >> >> I saw a JsonArray with more than 2000 elements, and I cannot understand >> what are they. The only thing I understand is the 2042nd element, it >> contains an unorder list. Maybe some elements before contains data about >> the order. >> >> I want to build a method to extract/deserialize this response. >> >> I am a newbie, if my question can be completed, can you guide me with >> more details, please? >> Java is good, but other languages are acceptable, I still can deploy it. >> >>> >>> -- You received this message because you are subscribed to the Google Groups "GWT Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-web-toolkit/6a2dd93b-d232-497b-a515-b7f6c0a108b9n%40googlegroups.com.
