Thank to Craig and Colin, I spent all day to know the stackoverflow post and document Colin provided. Yeah, I know some (not all) rules of request payload, and I can use it to replace param. But the response deserialization is hard, you not mention in you answer, my goal still be parse the response receive from server.
I need crawl 10000 users via GWT RPC, it is single-time crawl (I crawl it once) for my service. So, the performance is not important. Again, I know some data from server (response2.png) I attached before, There is a Json, start with //OK, next is Array with 2045 elements (0-2044), element from 0 - 2041 is something to confusing, element 2042 is an array list, it is arranged in a jumble, maybe above data (element from 0-2041) contains order of this. If you need the exactly response payload, please reply and I public it. Vào lúc 11:13:42 UTC+7 ngày Thứ Tư, 9 tháng 10, 2024, Craig Mitchell đã viết: > I second what Colin says "*Since you're scraping anyway, consider just > scraping the results of the rendered page? This will likely take > substantially more CPU time, but ridiculously less developer time to > implement."* > > GWT RPC is not an API. It will constantly change as the website updates. > > I'd recommend either using a proper API (if one exists), or an > off-the-shelf scraper tool. > > On Wednesday 9 October 2024 at 4:40:08 am UTC+11 Colin Alworth wrote: > >> I'd suggest reading the stream reader/write subtypes of >> AbstractSerializationStream to understand what all of the values are for - >> in short, a gwt-rpc response is a payload and a string table, and the >> payloads elements will reference the string table. You cannot know what the >> structure is for certain without seeing the original Java types being >> serialized, but often you can make good guesses. >> >> I'd also suggest reading stackoverflow posts and the like showing how to >> deserialize other payloads just from context - here's a post that breaks >> down a payload to understand its contents: >> https://stackoverflow.com/questions/35047102/serializing-rpc-gwt/35047887#35047887 >> >> If you havent yet, read >> https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit >> >> as well. >> >> In short though, your response value is _probably_ be a List of >> CourseMember types - knowing that class will help you. I can't easily guess >> more though, as the above doc says, the json array is read backwards, so >> the important details would be right before and after the string array - >> you have 1,7,2,1[...strings...] in the second image. From that I can say >> >> 1: if this was zero, it would be a null, since it is a positive number, >> read the (value - 1) entry from the string table, which is ArrayList, so: >> read a value of type ArrayList from the stream >> 7: the ArrayList has 7 items >> 2: first item in the arraylist - as above, if this was 0, it would be >> null, since it is positive, read the (value - 1) entry from the string >> table, and decode that type, so: read a CourseMember object from the payload >> 1: this is _probably_ the number 1 in the first field of the first >> CourseMember. >> ... >> >> A parser continuing in this way, with knowledge of the structure of these >> types could be written to decode this payload. I don't know of an >> off-the-shelf tool that will do it for you in a truly automated way, but >> could consult to write one, or guide your project in implementing one by >> hand. >> >> Since you're scraping anyway, consider just scraping the results of the >> rendered page? This will likely take substantially more CPU time, but >> ridiculously less developer time to implement. >> >> On Tuesday, October 8, 2024 at 12:03:30 PM UTC-5 [email protected] wrote: >> >>> Thank you because the detail response. >>> >>> I want to crawl data on a public website, I opened devtools and saw that >>> it was written by GWT RPC. >>> >>> This is the body of request I saw: >>> >>> 7|0|10| >>> https://a.b.c.d/e|5C6CDB13D0FD25B266F3C36FA7FF6ED9|a1.a2.a3.DataService|getCourseMembers|java.lang.Long/4227064769|java.lang.String/2004016611|java.util.List|20204524|java.util.Arrays$ArrayList/2507071751|20241|1|2|3|4|3|5|6|7|5|TXbrzIAAA|8|9|1|6|10| >>> >>> As you can see, no problem with that syntax, I can understand roughly, I >>> know the method is getCourseMembers. I want to build a function should >>> return above body, like: >>> public static String getBodyEncoded(String methodName, >>> ... String methodBody ...) or something similar, and return the body above >>> to send to server. >>> >>> I also want to know the last past of request syntax: >>> 1|2|3|4|1|5|6|7|7|8|7|9|7|10|7|11|7|12|7|13|7|14| >>> >>> The next is the response body. This is really the problem. A response is >>> very long, I put it in attached files. >>> >>> I saw a JsonArray with more than 2000 elements, and I cannot understand >>> what are they. The only thing I understand is the 2042nd element, it >>> contains an unorder list. Maybe some elements before contains data about >>> the order. >>> >>> I want to build a method to extract/deserialize this response. >>> >>> I am a newbie, if my question can be completed, can you guide me with >>> more details, please? >>> Java is good, but other languages are acceptable, I still can deploy it. >>> >>>> >>>> -- You received this message because you are subscribed to the Google Groups "GWT Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-web-toolkit/a5b80301-72e3-4091-b466-0c9069bf8215n%40googlegroups.com.
